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Foreword 



The typical development of a successful theory in computer science traverses 
three sometimes overlapping phases: an experimental phase, where phenom- 
ena are studied almost in a trial and error fashion, a busy phase of realization, 
where people use the results of the experimental phase in an “uncoordinated” 
fashion, and a contemplative phase, where people look for the essence of what 
has been previously achieved. In compiler optimization these three phases cur- 
rently coexist. New heuristics are still being proposed and purely evaluated on 
some benchmarks, and known techniques are still being implemented specif- 
ically for a new operating system or variants of programming languages, but 
increasingly many attempts now try to understand the full picture of compiler 
optimization in order to develop general frameworks and generators. 

This monograph is a typical contribution to third phase activities in that 
it presents a uniform framework capturing a large class of imperative pro- 
gramming languages and their corresponding transformations, together with 
directions for cookbook style implementation. Thus besides clarifying appro- 
priateness and limitations of the considered methods it also tries to open 
these methods even to non-experts. 

More technically, the monograph adresses the issue of extension: which 
principles are stable, i.e., remain valid when extending intraprocedurally suc- 
cessful methods to the interprocedural case, and what needs to be done in 
order to overcome the problems and anomalies arising from this extension. 
This investigation characterizes the power and flexibility of procedure mech- 
anisms from the data flow analysis point of view. 

Even though all the algorithms considered evolve quite naturally from 
basic principles, which directly leads to accessible correctness and optimal- 
ity considerations, they often outperform their “tricky” handwritten coun- 
terpart. Thus they constitute a convincing example for the superiority of 
concept- driven software development. 

The monograph presents a full formal development for so-called syntac- 
tic program analysis and transformation methods including complete proofs, 
which may be quite hard to digest in full detail. This rigorous development, 
on purpose structurally repetitive, is tailored to stress similarities and dif- 
ferences between the intraprocedural and interprocedural setting, down to 
the very last detail. However, the reader is not forced to follow the techni- 
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Foreword 



cal level. Rather, details can be consulted on demand, providing students 
with a deep yet intuitive and accessible introduction to central principles of 
intraprocedural and interprocedural optimization, compiler experts with pre- 
cise information about the obstacles when moving from the intraprocedural 
to the interprocedural case, and developers with concise specifications of easy 
to implement yet high-performance interprocedural analyses. 

Summarizing, this thesis can be regarded as a comprehensive account 
of what, from the practical point of view, are the most important program 
analysis and transformation methods for imperative languages. I therefore 
recommend it to everybody interested in a conceptual, yet far reaching entry 
into the world of optimizing compilers. 



Bernhard Steffen 
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The present monograph is based on the doctoral dissertation of the author 
[Knl]. It presents a new framework for optimal interprocedural program op- 
timization, which covers the full range of language features of imperative 
programming languages. It captures programs with (mutually) recursive pro- 
cedures, global, local, and external variables, value, reference, and procedure 
parameters. In spite of this unique generality, it is tailored for practical use. 
It supports the design and implementation of provably optimal program opti- 
mizations in a cookbook style. In essence, this is achieved by decomposing the 
design process of a program optimization and the proof of its optimality with 
respect to a specific optimality criterion into a small number of elementary 
steps, which can independently be proved using only knowledge about the 
specification of the optimization. This contrasts with heuristically based ap- 
proaches to program optimization, which are still dominant in practice, and 
often ad hoc. The application of the framework is demonstrated by means of 
the computationally and lifetime optimal elimination of partially redundant 
computations in a program, a practically relevant optimization, whose in- 
traprocedural variant is part of many advanced compiler environments. The 
purpose of considering its interprocedural counterpart is twofold. On the one 
hand, it demonstrates the analogies between designing intraprocedural and 
interprocedural optimizations. On the other hand, it reveals essential differ- 
ences which must usually be faced when extending intraprocedural optimiza- 
tions interprocedurally. Optimality criteria satisfiable in the intraprocedural 
setting can impossible to be met in the interprocedural one. Optimization 
strategies being successful in the intraprocedural setting can fail interproce- 
durally. The elimination of partially redundant computations is well-suited 
for demonstration. In contrast to the intraprocedural setting, computational 
and lifetime optimal results are in general impossible in the interprocedu- 
ral setting. The placement strategies leading to computationally and lifetime 
optimal results in the intraprocedural setting, can even fail to guarantee 
profitability in the interprocedural setting. We propose a natural constraint 
applying to a large class of programs, which is sufficient for the successful 
transfer of the intraprocedural elimination techniques to the interprocedural 
setting. Under this constraint, the resulting algorithm generates interproce- 
durally computationally and lifetime optimal results, making it unique. It is 
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not only more powerful than its heuristic predecessors but also more efficient, 
and reduces in the absence of procedures to its intraprocedural counterpart. 

The remainder of this prologue summarizes the background of this mono- 
graph, and provides a brief introduction to program optimization intended 
to make its presentation more easily amenable to novice readers in the field. 

Optimizing Compilers. In essence, a compiler is a program translating 
programs of some source language C\ into semantically equivalent programs 
of some target language £2 • One of the most typical applications of a compiler 
is the translation of a source program written in a high-level programming 
language into a machine program (often simply called “machine code” or just 
“code”), which can be executed on the computer the compiler is implemented 
on. Of course, compilers are expected to produce highly efficient code, which 
has led to the construction of optimizing compilers [ASU, WG, Mor]. 




Fig. 1.1. Structure of an optimizing compiler 

Figure 1.1 illustrates the general structure of an optimizing compiler. 
The central component is called an optimizer. Basically, this is a program 
designed for detecting and removing inefficiencies in a program by means 
of appropriate performance improving transformations. Traditionally, these 
transformations are called program optimizations. This general term, how- 
ever, is slightly misleading because program optimization cannot usually be 
expected to transform a program of “bad” performance into a program of 
“good” or even “optimal” performance. There are two quite obvious rea- 
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sons for this limitation. First, “bad,” “good,” and “optimal” are qualitative 
properties lacking a (precise) quantitative meaning. Second, interpreting the 
term optimization naively, does not impose any restrictions on the kind of 
transformations considered possible; restrictions, for example, which are usu- 
ally imposed by automation requirements. Following the naive interpretation, 
optimization would require replacing a sorting algorithm of quadratic time 
complexity by a completely different sorting algorithm where the second fac- 
tor is replaced by a logarithmic one. Optimizations of this kind would require 
a profound understanding of the semantics of the program under considera- 
tion, which is usually far beyond the capabilities of an automatic analysis. 

The original domain of program optimization is different. Usually, it leaves 
the inherent structure of the algorithms invariant, and improves their perfor- 
mance by avoiding or reducing the computational effort at run-time, or by 
shifting it from the run-time into the compile-time. Typical examples are loop 
invariant code motion, strength reduction, and constant folding . Loop invari- 
ant code motion moves computations yielding always the same value inside 
a loop to a program point outside of it, which avoids unnecessary recompu- 
tations of the value at run-time. Strength reduction replaces operations that 
are “expensive” by “cheaper” operations, which reduces the computational 
effort at run-time. Constant folding evaluates and replaces complex compu- 
tations, whose operands are known at compile-time, by their values, which 
shifts the computational effort from the run-time to the compile-time of the 
program. 

In practice, the power of an optimization is often validated by means of 
benchmark tests, i.e., by measuring the performance gain on a sample of pro- 
grams in order to provide empirical evidence of its effectivity. The limitations 
of this approach are obvious. It cannot reveal how “good” an optimization 
really is concerning the relevant optimization potential. In addition, it is 
questionable to which extent a performance improvement observed can be 
considered a reliable prediction in general. This would require that the sam- 
ple programs are “statistically representative” because the performance gain 
of a specific optimization depends highly on the program under consideration. 

In this monograph, we contrast this empirical approach by a mathematical 
approach, which focuses on proving the effectivity of an optimization. Central 
is the introduction of formal optimality criteria, and proof of the effectivity 
or even optimality of an optimization with respect to the criteria considered. 
Usually, these criteria exclude the existence of a certain kind of inefficiencies. 
Following this approach optimality gets a formal meaning. An optimization 
satisfying a specific optimality criterion guarantees that a program subjected 
to it cannot be improved any further with respect to this criterion, or hence 
with respect to the source of inefficiencies it addresses. Thus, rather than 
aiming at assuring of a specific percentage of performance improvement, our 
approach guarantees that a specific kind of inefficiency is proved to be absent 
after optimization. 
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Data Flow Analysis. Optimization must preserve semantics. It is thus 
usually preceded by a static analysis of the argument program, usually called 
data flow analysis (DFA), which checks the side-conditions under which an 
optimization is applicable. For imperative programming languages like Algol, 
Pascal, or Modula, an important classification of DFA techniques is derived 
from the treatment of programs with procedures. Intraprocedural DFA is 
characterized by a separate and independent investigation of the procedures 
of a program making explicit worst-case assumptions for procedure calls. In- 
terprocedural DFA takes the semantics of procedure calls into account, and is 
thus theoretically and practically much more ambitious than intraprocedural 
DFA. In contrast, local DFA considering (maximal sequences of) straight-line 
code only, so-called basic blocks, which are investigated separately and inde- 
pendently, is considerably simpler, but also less powerful than intraprocedural 
and interprocedural DFA. In distinction to local DFA, intraprocedural and 
interprocedural DFA are also called global DFA. Figure 1.2 illustrates this 
classification of DFA techniques, which carries over to program optimization, 
i.e., local, intraprocedural, and interprocedural optimization are based on 
local, intraprocedural, and interprocedural DFA, respectively. 




Fig. 1.2. Taxonomy of data flow analysis 

DFA is usually performed on an intermediate program representation. A 
flexible and widely used representation is the control flow graph {CFG) of 
a program. This is a directed graph, whose nodes and edges represent the 
statements and the branching structure of the underlying program. Figure 
1.3 shows an illustrative example. In order to avoid undecidability of DFA 
the branching structure of a CFG is usually nondeterministically interpreted. 
This means, whenever the control reaches a branch node, it is assumed that 
the program execution can be continued with any successor of the branch 
node within the CFG. Programs containing several procedures can naturally 
be represented by systems of CFGs. The control flow caused by procedure 
calls can be made explicit by combining them to a single graph, the inter- 
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procedural flow graph; intuitively, by connecting the call sites with the flow 
graphs representing the called procedures. 
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Fig. 1.3. Control flow graph 

Code Motion: A Practically Relevant Optimization. Code motion is 
one of the most widely used program optimizations in practice, for which 
there are two quite natural optimization goals concerning the number of 
computations performed at run-time, and the lifetimes of temporaries, which 
are unavoidably introduced as a side-effect of the transformation. Code mo- 
tion is thus well suited for demonstrating the practicality of our optimization 
framework because it is designed for supporting the construction of provably 
optimal optimizations. The code motion transformation we develop (interpro- 
cedurally with respect to a natural side-condition) satisfies both optimality 
criteria informally sketched above: it generates programs which are compu- 
tationally and lifetime optimal. The corresponding transformation to meet 
these criteria is not only unique, it is even more efficient than its heuristic 
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predecessors. In the following we illustrate the central idea underlying this 
transformation in the intraprocedural context. 




Fig. 1.4. A first code motion optimization 

In essence, code motion improves the efficiency of a program by avoid- 
ing unnecessary recomputations of values at run-time. For example, in the 
program of Figure 1.3 the computation of a -I- 6 at node 10 always yields 
the same value. Thus, it is unnecessarily recomputed if the loop is executed 
more than once at run-time. Code motion eliminates unnecessary recompu- 
tations by replacing the original computations of a program by temporaries 
(or registers), which are correctly initialized at appropriate program points. 
For example, in the program of Figure 1.3 the original computations of a -I- 6 
occurring at the nodes 10, 16, and 17 can be replaced by a temporary h, 
which is initialized by a -I- 5 at the nodes 8 and 9 as illustrated in Figure 1.4. 
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Admissible Code Motion 

Code motion must preserve the semantics of the argument program. This 
leads to the notion of admissible code motion. Intuitively, admissibility re- 
quires that the temporaries introduced for replacing the original computa- 
tions of a program are correctly initialized at certain program points as illus- 
trated above. In addition, it requires that the initializations of the temporaries 
do not introduce computations of new values on paths because this could in- 
troduce new run-time errors. Illustrating this by means of the program of 
Figure 1.3, the second requirement would be violated by initializing the tem- 
porary h at node 5 as shown in Figure 1.5. This introduces a computation 
of a -I- 6 on the path (1,4, 5, 7, 18), which is free of a computation of a -I- 5 
in the original program. Under the admissibility requirement, we can obtain 
computationally and lifetime optimal results as indicated below. 




Fig. 1.5. No admissible code motion optimization 
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Computationally Optimal Code Motion 

Intuitively, an admissible code motion is computationally optimal, if the num- 
ber of computations on every program path cannot be reduced any further by 
means of admissible code motion. Achieving computationally optimal results 
is the primary goal of code motion. The central idea to meet this goal is to 
place computations 

— as early as possible, while maintaining admissibility. 

This is illustrated in Figure 1.6 showing the program, which results from the 
program of Figure 1.3 by means of the “as-early-as-possible” placing strategy. 
All unnecessary recomputations of a -I- 6 are avoided by storing the value of 
a-|-6 in the temporary h and replacing all original computations of a -1-6 by h. 
Note that this program cannot be improved any further. It is computationally 
optimal. 

Lifetime Optimal Code Motion 

The “as-early-as-possible” placing strategy moves computations even if there 
is no run-time gain. In the running example this is particularly obvious when 
considering the computation of o -I- 6 at node 3 , which is moved without 
any run-time gain. Though unnecessary code motion does not increase the 
number of computations on a path, it can be the source of superfluous register 
pressure, which is a major problem in practice. The secondary goal of code 
motion therefore is to avoid any unnecessary motions of computations while 
maintaining computational optimality. This is illustrated in Figure 1.7 for 
the running example of Figure 1.3. 

Like the program of Figure 1.6, it is computationally optimal. However, 
computations are only moved, if it is profitable: the computations of a -I- 6 at 
nodes 3 and 17 , which cannot be moved with run-time gain, are not touched 
at all. The problem of unnecessary code motions is addressed by the crite- 
rion of lifetime optimality. Intuitively, a computationally optimal code mo- 
tion transformation is lifetime optimal, if the lifetimes of temporaries cannot 
be reduced any further by means of computationally optimal code motion. 
Intuitively, this means that in any other program resulting from a compu- 
tationally optimal code motion transformation, the lifetimes of temporaries 
are at least as long as in the lifetime optimal one. The central idea to achieve 
lifetime optimality is to place computations 

— as late as possible, while maintaining computational optimality. 

The “as-late-as-possible” placing strategy transforms computationally opti- 
mal programs into a unique lifetime optimal program. This is an important 
difference to computational optimality. Whereas computationally optimal re- 
sults can usually be achieved by several transformations, lifetime optimality 
is achieved by a single transformation only. 

Figures 1.8 and 1.9 illustrate the lifetime ranges of the temporary h for 
the programs of Figures 1.6 and 1.7, respectively. 
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Fig. 1.6. A computationally optimal program 

Summarizing, the “as-early-as-possible” code motion transformation of 
Figure 1.6 moves computations as far as possible in order to achieve compu- 
tationally optimal results; the “as-late-as-possible” code motion transforma- 
tion of Figure 1.7 moves computations only as far as necessary . Therefore, 
we call the first transformation the busy code motion transformation and the 
second one the lazy code motion transformation, or for short the BCM- and 
L CM -transformation . 

In this monograph, we will show how to construct intraprocedural and in- 
terprocedural program optimizations like the BCM- and LCM-transformation 
systematically. However, we also demonstrate that usually essential differ- 
ences have to be taken into account when extending intraprocedural optimiza- 
tions interprocedurally. We illustrate this by developing the interprocedural 
counterparts of the BCM- and LCM-transformation for programs with re- 
cursive procedures, global, local, and external variables, value, reference and 
procedure parameters. We show that interprocedurally computationally and 
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Fig. 1.7. The computationally and lifetime optimal program 

lifetime optimal results are in general impossible. Therefore, we propose a 
natural constraint which is sufficient to meet both criteria for a large class 
of programs. The resulting algorithms are unique in achieving interproce- 
durally computationally and lifetime optimal results for this program class. 
Their power is illustrated by a complex example in Section 10.6. Additionally, 
a detailed account of the example considered in the prologue for illustrating 
the intraprocedural versions of busy and lazy code motion can be found in 
Section 3.5. 
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Fig. 1.8. Lifetime ranges after the iJCM-transformation 
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1. Preface 



1.1 Summary 

A new framework for interprocedural program optimization is presented, 
which is tailored for supporting the construction of interprocedural program 
optimizations satisfying formal optimality criteria in a cookbook style. The 
framework is unique in capturing programs with statically nested (mutu- 
ally) recursive procedures, global and local variables, value, reference, and 
procedure parameters. In addition, it supports separate compilation and the 
construction of software libraries by dealing with external procedures and 
external variables. An important feature of the framework is that it strictly 
separates the specification of an optimizing transformation and the proof of 
its optimality from the specification of the data flow analysis algorithms com- 
puting the program properties involved in the definition of the transformation 
and the proofs of their precision. This structures and simplifies the develop- 
ment of optimal program transformations, and allows us to hide all details of 
the framework which are irrelevant for application. In particular, this holds 
for the higher order data flow analysis concerning formal procedure calls, 
which is organized as an independent preprocess. The power and flexibility 
of the framework is demonstrated by a practically relevant optimization: the 
computationally and lifetime optimal elimination of interprocedurally par- 
tially redundant computations in a program. As a side-effect this application 
reveals essential differences, which usually must be taken into account when 
extending intraprocedural optimizations interprocedurally. Concerning the 
application considered here, this means that computationally and lifetime 
optimal results are interprocedurally in general impossible. However, we pro- 
pose a natural constraint which is sufficient to meet these optimality criteria 
for a large class of programs. Under this constraint the algorithm developed is 
not only unique in satisfying both optimality criteria, it is also more efficient 
than its heuristic predecessors. 



1.2 Motivation 

Program optimization is traditionally the general term of program transfor- 
mations, which are intended to improve the run-time or the storage efficiency 
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of a program.^ In the imperative programming paradigm an important clas- 
sification of program optimization is derived from the treatment of programs 
with procedures. Intraprocedural program optimization is characterized by a 
separate and independent investigation of the procedures of a program where 
explicit worst-case assumptions are made for procedure calls. Interprocedu- 
ral program optimization takes the semantics of procedure calls into account, 
and is thus theoretically and practically more ambitious than intraprocedural 
optimization. 

Ideally, optimizing program transformations preserve the semantics of the 
argument program, improve their run-time efficiency, and satisfy formal op- 
timality criteria. This is worth to be noted because in practice heuristically 
based transformations are still dominant. Even transformations which some- 
times impair the run-time efficiency are considered program optimizations. 
In [CLZ] they are called non-strict in contrast to strict optimizations, which 
are required to be always run-time improving. In contrast to this pragmatic 
approach, we are interested in optimal program optimization, i.e., in program 
transformations which are strict in the sense of [CLZ], and provably optimal 
with respect to formal optimality criteria. 

The construction of provably optimal program transformations has been 
studied in detail in the intraprocedural case. Conceptually, it is important 
to separate the specification of a program transformation and the proof of 
its optimality from the specification of the data flow analysis (DFA) algo- 
rithms computing the program properties involved in the definition of the 
transformation, and the proof that they compute these properties precisely. 
This leads to the following two-step structure: 

1. Specify a program transformation and prove its optimality with respect 
to a specific formal optimality criterion of interest. 

2. Specify the DFA-algorithms and prove that they precisely compute the 
program properties involved in the definition of the transformation of the 
first step. 

The first step can be directed by general guide-lines fixing the elementary 
steps which are necessary for proving the optimality of a transformation. 
The details, of course, depend on the concrete transformation and the op- 
timality criterion under consideration. The second step can be organized in 
greater detail. In essence, its specification part reduces to specifying the DFA- 
information of interest for a given program property, and the way in which 
it is computed by the elementary statements of a procedure. The concrete 
DFA-algorithm results then automatically from instantiating a generic DFA- 
algorithm by the specification. Proving its precision for the program property 
under consideration can be split into three elementary substeps, which only 



^ In general there is a trade-off between run-time and storage improving transfor- 
mations (e.g., procedure inlining, loop unrolling). In this monograph we focus 
on the run-time, which is the major concern in practice. 
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concern the domain of the specification, and can independently be proved. 
The theory of abstract interpretation (cf. [CCl, CCS, CC4, Ma, Niel, Nie2]), 
and the well-known Coincidence Theorem 2.2.2 of Kildall [Kil] and Kam and 
Ullman [KU2] giving a sufficient condition for the coincidence of the speci- 
fying meet over all paths solution and the algorithmic maximal fixed point 
solution of a DFA-problem are central for the simplicity and elegance of this 
approach. 

1.2.1 The Framework 

We present a framework for interprocedural program optimization, which 
evolves as an extension and generalization of the stack-based framework for 
interprocedural data flow analysis (IDFA) of [KSl]. The new framework ap- 
plies to Algol-like programming languages, and supports the optimization of 
complete programs as well as of program modules, which is required for sep- 
arate compilation and the construction of software libraries. It is unique in 
capturing programs with 

— statically nested mutually recursive procedures, 

— global and local variables, 

— value, reference, and procedure parameters, and 

— external variables and procedures. 

The new framework maintains the two-step structure of intraprocedural pro- 
gram optimization: the specification and the optimality proof of an interpro- 
cedural program transformation is separated from the specification and the 
precision proofs of the IDFA-algorithms computing the program properties 
involved in the definition of the transformation under consideration. More- 
over, both steps are organized like their intraprocedural counterparts. This 
means, the first step is directed by general guide-lines fixing the obligations for 
specifying and proving the optimality of an interprocedural program transfor- 
mation. The second step, considering the specification part first, reduces as 
in the intraprocedural case essentially to specifying the DFA-information of 
interest for a given program property, and the way in which it is computed by 
the elementary statements of a program. The only difference to the intrapro- 
cedural setting is the necessity of specifying the effect of return nodes in order 
to deal with local variables and value parameters of recursive procedures. The 
concrete IDFA-algorithm results as intraprocedurally automatically from in- 
stantiating the generic IDFA-algorithms of the interprocedural framework by 
the specification. Proving its precision for the property under consideration 
requires only a single step in addition to the intraprocedural case. This step 
is concerned with the effect of return nodes. The precision proof of the gener- 
ated IDFA-algorithm consists of four elementary substeps, whose proofs are 
usually straightforward as in the intraprocedural case and concern the do- 
main of the specification only. Central for achieving this is the Interprocedural 
Coincidence Theorem of [KSl], which is an interprocedural generalization of 
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the Coincidence Theorem of [Kil, KU2]. It gives a sufficient condition for 
the coincidence of the specifying interprocedural meet over all paths (IMOP) 
solution and the algorithmic interprocedural maximal fixed point (IMFP) 
solution of an IDFA-problem. In comparison to the presentation of [KSI], 
the framework and the coincidence theorem are extended in order to deal 
with static procedure nesting, external variables and procedures, and refer- 
ence and procedure parameters. Whereas the extension to static procedure 
nesting and to external variables and procedures is straightforward, the ex- 
tension to procedure parameters is more intricate, and requires a higher order 
data flow analysis (HO-DFA). As a side-effect, it turns out that the HO-DFA 
covers uniformly also reference and name parameters offering a conceptually 
new approach to the computation of alias information. 

Higher Order Data Flow Analysis. The central idea to handle formal 
procedure calls in our framework is to consider formal procedure calls as 
“higher order” branch statements and to interpret them nondeterministically 
during ID FA. This can be considered the natural analogue to the nondeter- 
ministic treatment of ordinary branch statements in intraprocedural DFA. 
Moreover, it allows us to organize the HO-DFA as a preprocess of the “usual” 
IDFA, and to hide all details of the HO-DFA from the subsequent IDFA. Its 
intent is to determine for every formal procedure call in a program the set 
of procedures it can call at run-time as precisely as possible. This is closely 
related to approaches for constructing the procedure call graph of a program 
(cf. [CCHK, HK, Lak, Ry, Wal]).^ These approaches, however, are mostly 
heuristically based, ^ and concentrate on the correctness (safety) of their ap- 
proximation. They do not systematically deal with precision or decidability 
in general. This contrasts with our approach, where investigating correct- 
ness and precision, and addressing the theoretical and practical limitations 
of computing the set of procedures, which can be invoked by a formal pro- 
cedure call, and integrating the results obtained into IDFA, is central. We 
show that the problem of determining the set of procedures which can be 
invoked by a formal procedure call is a refinement of the well-known formal 
reachability problem (cf. [Lai]). We therefore call the refined problem the 
formal callability problem. Formal callability yields as a by-product the so- 
lution of the formal reachability problem. This is important because of the 
well-known undecidability of formal reachability in general (cf. [Lai]) as it 
directly implies that also formal callability is in general not decidable. It 
is true that formal reachability is decidable for finite mode languages (e.g., 
ISO-Pascal [ISO]) [Lal],^ but the mode depth dramatically affects the com- 
putational complexity: whereas for programming languages of mode depth 2 

^ In [Lak] a different set-up is considered with procedure valued variables instead 
of procedure parameters. 

® For example, the algorithm of [Ry] is restricted to programs without recursion. 

^ Intuitively, in a program with finite mode depth procedure parameters can com- 
pletely be specified without using mode equations as it is necessary for certain 
procedure parameters of programs with infinite mode depth (Consider e.g. self- 
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(e.g., Wirth’s Pascal [Wth, HW]) and a limit on the length of parameter lists, 
formal reachability is decidable in quadratic time [Ar3], it is P-space hard in 
the general case of unbounded finite mode depth languages [Ar3, Wi2]. 

Motivated by the theoretical limitations of deciding formal callability, and 
the practical limitations imposed by the efficiency requirements of compilers, 
we introduce a correct (safe) approximation of formal callability, called po- 
tential pas s ability , which can efficiently be computed. Moreover, for programs 
of mode depth 2 without global formal procedure parameters, our approxi- 
mation is precise for formal callability, i.e., potential passability and formal 
callability coincide. 

Reference and Name Parameters. Reference and name parameters can 
be regarded as parameterless procedure parameters. This observation is 
the key to uniformly deal with procedure parameters, reference and name 
parameters.® Using this identification the HO-DFA can directly be used for 
computing (safe approximations of) the set of may-aliases of a reference pa- 
rameter. Moreover, it can easily be modified for computing also (safe approx- 
imations of) the set of must-aliases of a reference parameter. 
Interprocedural Data Flow Analysis. In IDFA we focus on programs, 
which satisfy the strong formal most recent (sfmr) property (cf. [Ka]); a prop- 
erty, which holds trivially for programs without formal procedure calls.® 
The point of concentrating on programs with this property is that in sfmr- 
programs^ formal procedure calls can be handled in a most recent fashion 
like ordinary procedure calls, and thus, as efficiently as ordinary procedure 
calls.® The validity of the sfmr-property guarantees that the simplified and 
more efficient treatment of formal procedure calls is correct: so-called most 
recent errors in accessing non-local variables, which are known from early 
implementations of run-time systems, do not occur (cf. [McG]). 

We recall that the sfmr-property is decidable at compile-time [Ka] . How- 
ever, similar to formal reachability the mode depth is crucial for the compu- 
tational complexity: for programming languages of mode depth 2 and a limit 
on the length of parameter lists, the sfmr-property is decidable in polyno- 
mial time [Ar2], but it is P-space complete in the general case of Algol-like 

application of a procedure tt as in the call statement ‘^call 7r(7r)”. Self-application 
is illegal in a language with hnite mode depth!). 

® Note that call-by-reference and call-by-name coincide as long as there are no 
complex data structures. 

® Intuitively, this means that in the run-time stack maintaining the activation 
records of the procedures, which have been called but not yet terminated, the 
static pointer of each activation record created by a call of a procedure tt always 
refers to the most recent activation record created by a call of the static pre- 
decessor of 7T (cf. [Ar2]). We remark that our HO-DFA does not rely on this 
property. 

^ I.e., programs satisfying the sfmr-property. 

® In our approach to IDFA programs are treated in the sense of the mr-copy rule 
(“most recent” -copy rule), which for sfmr-programs coincides with the static 
scope copy rule in the sense of AlgolGO (cf. [012]). 
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languages, even if at most 4 local (procedure) parameters per procedure are 
allowed [Wil, Wi2]. Thus, for most practical applications additional criteria, 
which are sufficient for the sfmr-property and can efficiently be checked, will 
be necessary (cf. [McG]). A simple syntactic criterion is that a program does 
not contain statically nested procedures [Wi2]. Of course, this looks arbi- 
trary and quite restrictive. It is thus worth noting that C is an example of 
a widely used programming language, which does not allow statically nested 
procedures. In addition, the nesting criterion is also important because pro- 
grams with statically nested procedures can often effectively be transformed 
into formally equivalent programs without statically nested procedures.® 
This is known as the modularity property of a program (cf. [La2]).^® A pro- 
gram satisfies the modularity property if and only if it has a regular formal 
call tree [Oil]. Based on the technique of accompanying parameters of [La2], 
there is an effective procedure that transforms an Algol-like program with a 
regular formal call tree into a formally equivalent Algol-like program without 
statically nested procedures [Oil]. Unfortunately, the modularity property is 
not decidable in general [011].^^ Thus, the following result of [La2] is here 
particularly important: there is an effective procedure transforming every 
Algol-like program without global formal procedure parameters into a for- 
mally equivalent Algol-like program without statically nested procedures. 

Summarizing, the advantages of focusing IDFA on sfmr-programs are as 
follows: 

1. HO-DFA can be organized as a preprocess which can be hidden from 
IDFA: 

~^Formal procedure calls do not affect the construction and the efficiency 
of IDFA-applications. 

2. Formal procedure calls can be interpreted as nondeterministic higher 
order branch statements which can be treated in a most recent fashion 
like ordinary procedure calls: 

~^Formal procedure calls fit uniformly into the standard techniques of 
static program analysis, and can be treated as efficiently as ordinary 
procedure calls. 

3. sfmr-programs can effectively be transformed into formally equivalent 
programs without statically nested procedures: 



® Formal equivalence of programs induce their functional equivalence [La4]. 

This directly implies that sfmr-programs are universal in the sense that for 
every program satisfying the modularity property there is a formally equivalent 
program satisfying the sfmr-property. 

The set of programs satisfying the sfmr-property is a proper (decidable) subset 
of the set of programs satisfying the modularity property [Oil]. 

In [La2] the transformation is developed for Algol60-P, which stands for pure 
AlgolGO. Details on Algol60-P can be found in [Lai]. 
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~^IFDA-applications on programs without statically nested procedures 
are often more efficient. 

We remark that the HO-DFA of our framework is primarily designed as an 
efficient preprocess for the interprocedural analysis of sfmr-programs with 
formal procedure calls. However, the problem underlying it, formal calla- 
bility, and the correctness, precision, and complexity results concerning po- 
tential passability are closely related to classical problems of compiler op- 
timization like formal reachability and formal recursivity. In contrast to 
the latter two problems, which have thoroughly been studied in the liter- 
ature concerning both their decidability and their inherent complexity (cf. 
[Arl, Ar2, Ar3, Lai, Wil, Wi2]), formal callability has not yet been inves- 
tigated systematically to the knowledge of the author (except for the prag- 
matic approaches of call graph analysis (cf. [CCHK, HK, Lak, Ry, Wal]).^"^ 
Our main result concerning formal callability that it can be computed in 
quadratic time for programs of mode depth 2 without global formal proce- 
dure parameters, if there is a limit on the length of parameter lists, is a 
direct analogue to the central result of [Ar3] that the coarser problem of for- 
mal reachability is decidable in quadratic time for programs of mode depth 2 
and a limit on the length of parameter lists. In addition, our HO-DFA yields 
a new approach for computing alias information for reference parameters, 
which is conceptually significantly different from traditional approaches to 
the alias problem (cf. [Ban, Co, CpK2, LH, We]). 

1.2.2 The Application 

After developing the new framework for interprocedural program optimiza- 
tion, we demonstrate its power and flexibility by means of a practically rele- 
vant application, the computationally and lifetime optimal elimination of in- 
terprocedurally partially redundant computations in a program. The data flow 
analyses involved can rather straigthforward be deduced from their intrapro- 
cedural counterparts. However, we demonstrate that in the interprocedural 
setting computationally and lifetime optimal results are in general impossi- 
ble. Therefore, we propose a natural sufficient constraint together with an 
algorithm, which meets both optimality criteria for a large class of programs. 
The algorithm evolves as the interprocedural counterpart of the algorithm 
for lazy code motion of [KRSl, KRS2]. Under the constraint proposed the 
new algorithm is not only more powerful than its heuristic predecessors (cf. 
[Mo, MR2, SW]), but also more efficient. 

This also holds for the application developed in Chapter 10 and Chapter 11. 
However, the trade-off between the costs of eliminating static procedure nesting, 
which usually increases the program size, and the efficiency gain of IDFA must 
be taken into account. 

In [Ar2] a similar notion was introduced defining when a procedure tt formally 
calls a procedure tt'. This, however, is still a coarser notion than formal callabil- 
ity, which defines when a specific call site in tt formally calls procedure tt' . 
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1.2.3 Related Work 

There is a large variety of approaches to interprocedural DFA and optimiza- 
tion. In essence, they can be classified into three major groups. Approaches, 
which 

1. reduce interprocedural DFA to intraprocedural DFA (e.g., by means of 
procedure inlining), or which use interprocedural information (e.g., which 
variables are used or modified by a procedure call) for intraprocedural 
program optimization [AGS, HC, Ri, RG]. 

2. deal with specific problems of interprocedural program optimization, 
e.g., constant propagation [GGpKT, GT], loop invariant code motion 
[SW], partial redundancy elimination [Mo, MR2], register allocation [SH], 
branch elimination [BGS], slicing [HRB], alias information [Bu, Go, 
GpK2, GpK3, De, ERH, LH, LR, My, We], or information whether 
a specific variable is used, modified or preserved by a procedure call 
[A112, Ban, Bth, Bu, GpKl, HS, Ro]. 

3. aim at a unifying framework for interprocedural DFA and interprocedural 
program optimization [Bal, Ba2, Bou, GG2, SRH2, KSl, JM, RHS, SP, 
SRHlj. 

Approaches of the first group are not truly interprocedural. They aim 
at performing traditional intraprocedural techniques “more accurately” by 
weakening the worst-case assumptions of specific procedure calls. The results 
of Richardson and Ganapathi [Ri, RG] show that the effect of such approaches 
on the efficiency of a program is limited in practice. 

Approaches of the second group are tailored for specific applications. Usu- 
ally, it is not clear how to modify these specialized approaches in order to 
arrive at a uniform framework for interprocedural DFA or interprocedural 
program optimization. 

Approaches of the third group have been pioneered by Gousot and Gousot 
[GG2], Barth [Bal, Ba2], Sharir and Pnueli [SP], Jones and Muchnick [JM], 
and more recently by Bourdoncle [Bou]. These approaches address mainly 
correctness of an interprocedural DFA, and their applicability is limited as 
they do not properly deal with local variables of recursive procedures, which 
would require a mechanism to store information about local variables when 
treating a recursive call. The proper treatment of local variables and param- 
eters of recursive procedures was a major achievement of the stack-based 
framework of Knoop and Steffen [KSl]. Fundamental was the introduction 
of so-called DFA-stacks and return functions, which allow an analysis to dis- 
tinguish between effects on global and local variables after returning from 
a procedure call. This turned out to be the key for constructing a generic 
algorithm, which computes the “intuitively desired” solution of an interpro- 
cedural DFA-problem, and handles local variables properly. 

The framework presented here is based on the approach of [KSl], but 
in addition to the specification, correctness and precision of interprocedural 




1.2 Motivation 



9 



DFA-algorithms, the framework here addresses also the specification, correct- 
ness and optimality of interprocedural program optimizations based thereof. 
Moreover, it is enhanced in order to capture reference and procedure pa- 
rameters, external variables, and external procedures making the framework 
unique. In addition, the following features are central. (1) The framework is 
general: it is not restricted to certain problem classes. (2) It is optimal: DFA- 
algorithms and optimizations based thereon are precise even in the presence 
of recursive procedures. (3) It is tailored for practical use: all details, which 
are irrelevant for a specific application are hidden. This point is particu- 
larly important in practice because the framework allows the construction 
of optimal interprocedural program optimizations in a cookbook style. This 
contrasts to previous approaches, which often have a foundational character. 
The use of their techniques requires usually a detailed understanding of the 
underlying frameworks. 

Recently, two efficiency oriented approaches for interprocedural DFA have 
been introduced by Reps, Horwitz, and Sagiv, and Duesterwald, Gupta, and 
Soffa, respectively, which also address the treatment of local variables. The 
framework presented here can be regarded as the theoretical backbone for 
proving the correctness of these approaches, which we therefore discuss in 
more detail here. 

Reps, Horwitz, and Sagiv proposed in [RHS] an algorithm for solving 
interprocedural DFA-problems over finite lattices in a way, which captures 
global and local variables, and value and reference parameters. They achieve 
the proper treatment of local variables by means of two separate functions, 
which mimic the return functions originally introduced in [KSl]:^® one func- 
tion for extracting the globally relevant part from the data flow information, 
which is valid at the termination time of the called procedure. And another 
function for extracting the data flow information about local variables, which 
must be re-established after finishing the call, from the data flow informa- 
tion, which is valid immediately before the call. This implicit treatment of 
return functions allows them to reduce an interesting class of DFA-problems 
to graph reachability problems that can efficiently be solved, and to compute 
the effect of procedure calls in a by-need fashion. Moreover, it enables the 
treatment of some potentially infinite abstract domains [SRHl]. However, 
the introduction of the two “artifical” functions introduces additional paths 
in the graph used to represent the program, which do not correspond to a 
standard program execution path. As a consequence, when one encodes a 
problem in their framework(s), the proof of correctness for the encoding in 
the sense of the meet over all paths approach must account for these paths. 
This proof can be simplified by means of the theorems applying to the frame- 
work here. It is sufficient to prove the equivalence to the /MFP-solution. The 
Interprocedural Coincidence Theorem 8.4.2 then yields that it coincides as 

This was made explicit in a private communication with Thomas Reps (Novem- 
ber 1994). 
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desired with the /MQP-solution, and thus, with the intuitively desired solu- 
tion of the IDFA-problem under consideration. Moreover, following the lines 
of [KRS4] proving the equivalence to the ZMFP-solution does not require 
the consideration of DFA-stacks at all, which must be part of any sound 
ZMQP-solution capturing local variables and value parameters. 

Duesterwald, Gupta, and Soffa proposed in [DGSl, DGS2] an algorithm 
for demand-driven interprocedural DFA, which works for programs with 
global and local variables, value and reference parameters. The point of their 
algorithm is to compute data flow information for a given program point with- 
out performing an exhaustive analysis of the argument program. In practice, 
this may be used e.g. for the debugging of a program during its develop- 
ment. The effect of the return functions of [KSl] to handle local variables 
can problem specifically be encoded in the binding functions, which realize 
the mapping between the address space of the calling procedure and the called 
procedure. Like in [RHS] a formal correctness proof for this approach depends 
on the correspondence between the computed solution and the operational 
understanding of a procedural program, and can be established by means 
of the results applying to the framework presented here. We remark that 
another and conceptually quite different algorithm for demand-driven DFA, 
which is based on magic-set transformations, has recently been proposed by 
Reps [Rel, Re2]. 

1.2.4 Organization of the Monograph 

The monograph consists of four parts. In the first part we revisit the standard 
framework for intraprocedural program optimization in a cookbook view. 
Subsequently, we illustrate the framework by means of the practically rele- 
vant algorithm for lazy code motion (LCM) of [KRSl, KRS2]. This algorithm 
was the first one to eliminate partially redundant computations in a proce- 
dure in a computationally and lifetime optimal fashion. In contrast to the 
presentation of [KRSI, KRS2], the essential step of proving the precision 
of the DFA-algorithms involved in the LGM-transformation is here based 
on the Goincidence Theorem 2.2.2. Whereas intraprocedurally the impact of 
this different proceeding is mostly of technical nature, it turns out that it is 
quite advantageous interprocedurally because it significantly simplifies and 
shortens the precision proofs. 

After the introductory first part, the second and the third part are cen- 
tral for the monograph. In the second part we introduce the new framework 
for interprocedural program optimization. Besides the treatment of formal 
procedure calls, and external variables and procedures, the cookbook view 
of the interprocedural framework is particularly important as it stresses the 
analogies and differences to the intraprocedural setting. Subsequently, we il- 
lustrate in the third part the interprocedural framework by means of the 
interprocedural extensions of the algorithms for busy and lazy code motion. 
We show that interprocedurally computational and lifetime optimality are 
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in general impossible, but that both criteria can be met under a natural 
sufficient side-condition for a large class of programs. 

In the fourth part, finally, we discuss a variety of pragmatic aspects related 
to interprocedural code motion and the new framework, and give directions 
to future work. 

Note that highlighting the analogies and differences of the intraprocedural 
and interprocedural setting as mentioned above, as well as underlining the 
similarity of the proceeding for different data flow problems, when applying 
the framework, is a central concern of our presentation. To this end proofs 
for (most) theorems applying to the intraprocedural setting are given in full 
detail, even though they are corollaries of their interprocedural counterparts. 
In particular, this concerns theorems on properties of DFA-algorithms con- 
sidered for illustrating the intra- and interprocedural framework. However, 
this presentation principle, sometimes even re-picking up a paragraph almost 
verbatim, is not restricted to proofs. It also recurs in the presentation of the 
underlying specifications of the DFA-algorithms as well as in the presenta- 
tion of the intra- and interprocedural frameworks, and culminates in their 
cookbook summaries. 

We conclude this section with a more detailed sketch of the contents of 
the following chapters. 

— Part I: Introduction 

— Chapter 2 revisits the standard framework for intraprocedural program 
optimization in a cookbook view. 

^ Chapter 3 and Chapter 4 illustrate the intraprocedural framework by 
recalling the transformations for busy (BCM-) and lazy code motion 
(LCM-transformation), and specifying the DFA-algorithms for comput- 
ing the program properties involved in the BCM- and LCM-transforma- 
tion, respectively. The precision proofs of the DFA-algorithms differ from 
the original proofs of [KRSl, KRS2], and are thus explicitly given. 

— Part II: The Framework 

— Chapter 5 introduces the programming language representing the com- 
mon of Algol-like languages, which we consider during the development 
and application of the interprocedural framework. 

— Chapter 6 presents the HO-DFA dealing with formal procedure calls. 
Central is to introduce the notion of formal callability, which we show to 
yield a refinement of the formal reachability problem. Like formal reacha- 
bility it turns out that formal callability is not decidable in general. Thus, 
we introduce a correct (safe) approximation of formal callability, called 
potential passability, which can efficiently be computed. Moreover, for 
programs of mode depth 2 without global formal procedure parameters 
we prove that formal callability and potential passability coincide. 

— Chapter 7 completes the setting of IDFA. In particular, we fix the inter- 
face between HO-DFA and IDFA. Additionally, we introduce flow graph 
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systems and interprocedural flow graphs as representations of programs 
with procedures. 

— Chapter 8 presents the stack-based framework of IDFA. Central is the 
introduction of DFA-stacks and return functions, which are the prereq- 
uisite for defining the interprocedural versions of the meet over all paths 
approach and the maximal fixed point approach. The solutions of these 
approaches define the “specifying” and the “algorithmic” solution of an 
IDFA-problem, respectively. The main results of this chapter are the 
Interprocedural Correctness Theorem 8.4.1 and the Interprocedural Co- 
incidence Theorem 8.4.2, which give sufficient conditions for the correct- 
ness and the precision of the algorithmic solution with respect to the 
specifying solution of an IDFA-problem. Finally, the specification of an 
IDFA-problem is formalized, and the generic fixed point algorithms for 
computing the algorithmic solution of a given IDFA-problem are pre- 
sented. 

— Chapter 9 summarizes the presentation of the second part. In this chap- 
ter we take the view of a designer of an interprocedural program opti- 
mization, and arrive at a cookbook for optimal interprocedural program 
optimization. 

— Part III: The Application 

— Chapter 10 illustrates the framework for interprocedural program op- 
timization by means of the interprocedural extensions of the BCM- 
transformation and the LCM-transformation. It demonstrates that in 
contrast to the data flow analyses, which can rather straightforward 
be transferred from the intraprocedural setting, transformations based 
thereon usually require additional care. In particular, it shows that com- 
putationally and lifetime optimal results are interprocedurally in general 
impossible. Under a natural side-condition, however, the interprocedural 
extensions of the BCM- and LCM-transformation are proved to satisfy 
for a large class of programs both optimality criteria. 

— Chapter 11 presents the IDFA-algorithms for computing the program 
properties involved in the IB CM- and the LLCM-transformation to- 
gether with the proofs of their precision. 

— Part IV: Conclusion 

— Chapter 12 discusses a variety of pragmatic aspects related to interpro- 
cedural code motion and the framework presented, and gives directions 
to future work. 

The monograph closes with the bibliography and an index for simplifying 
the access to definitions and technical terms. In particular, the index entry 
“notations” can be used as a quick-reference to symbols and abbreviations. 




2. The Intraprocedural Framework 



In this chapter we revisit the standard framework for intraprocedural pro- 
gram optimization. Beyond recalling the framework for the convenience of 
the reader, the point of this revision is to structure and summarize it in a 
fashion, which allows the construction of provably optimal program optimiza- 
tions in a cookbook style. This is important as in Part II we will show how to 
lift this cookbook oriented presentation of the intraprocedural framework to 
the interprocedural setting. As a by-product, this reveals and highlights the 
essential analogies and differences of the intraprocedural and interprocedural 
setting. 



2.1 Intraprocedural Program Optimization 

Program optimization is traditionally used as general term for program trans- 
formations, which are intended to improve the run-time or storage efficiency 
of a program. Thus, speaking of program optimization does usually not im- 
ply the generation of truly “optimal” programs as suggested by the term. 
In practice, one is often satisfied by heuristically based optimizations. Even 
transformations which sometimes impair the efficiency of a program are often 
considered program optimizations. Cytron, Lowry, and Zadeck distinguish 
strict transformations, which are required to always improve the efficiency, 
and non-strict transformations, which sometimes may fail to do this (cf. 
[CLZ]). 

In contrast to these pragmatic approaches, we are interested in optimal 
program optimization, i.e., in program transformations which are provably 
optimal with respect to formal optimality criteria} Toward this end we de- 
compose program optimization into two steps. In the first step, we fix a class 
of program transformations T together with a formal optimality criterion 
O. In the second step, we fix a transformation Tropt G T, and prove that it 
is optimal with respect to O, or more briefly, that it is O-optimal. In general, 
the ingredients of this two-step approach must be defined by the designer of 

^ In the following we will focus on run-time improving transformations, which 
are the major concern in practice. The framework, however, applies to storage 
improving transformations as well. 



J. Knoop: Optimal Interprocedural Program Optimization, LNCS 1428, pp. 15-29, 1998. 
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the optimization. Usually, the optimality criterion O is based on a pre-order, 
which fixes the standard of comparison between different programs. The pro- 
gram transformations of T are typically defined in terms of a set of program 
properties (j>, which fix the side conditions under which a transformation is 
applicable. The validity of these properties must be verified before the trans- 
formation can be performed. This is the task of data flow analysis (DFA), 
which usually precedes every optimizing program transformation. Of course, 
in order to guarantee that the transformation induced by the results of the 
DFA is correct or even O-optimal, the program properties involved must be 
computed precisely by the DFA (or at least conservatively, i.e., safely approxi- 
mated). This leads to the notions of tp-precise and tp-correct DFA-algorithms, 
which we will consider in more detail in the following sections presenting our 
two-step approach for optimal program optimization. 

After recalling flow graphs as an appropriate representation of procedures 
in Section 2.1.1, we introduce our two-step scheme of optimal program op- 
timization in Section 2.1.2, and informally present the notion of provably 
precise (correct) DFA-algorithms in Section 2.1.3. Subsequently, we recall in 
Section 2.2 the theory of abstract interpretation, which provides the the- 
oretical foundation of precise (correct) DFA-algorithms, and give a formal 
definition of a DFA-specification. Finally, we summarize in Section 2.3 our 
revision of the intraprocedural framework for optimal program optimization 
in a concise and structured form, which allows us to construct and prove the 
optimality and precision of program transformations and DFA-algorithms, 
respectively, in a cookbook style. 

2.1.1 Procedures and Flow Graphs 

Intraprocedural program optimization is characterized by a separate and in- 
dependent investigation of the procedures of a program. As usual, we rep- 
resent the procedures of a program as directed flow graphs G = {N, A, s, e) 
with node set N and edge set E? Nodes n G N represent the statements, 
and edges (n, m) G E the nondeterministic branching structure of the un- 
derlying procedure; s and e denote the unique start node and end node of 
G, which are assumed to have no incoming and outgoing edges, respectively.^ 

For every flow graph G, we denote the set of immediate predeces- 
sors and successors of a node n by predG{n)=df {m\{m,n) G E} and 
succG{n)=df {m\ (n,m) G E}. A finite path in G is a sequence (ni, . . . , Uq) 
of nodes such that (rij, rij+i) G E for j G {1, . . . , q— 1}. Moreover, Pg[w, n] 
denotes the set of all finite paths from m to n, Pc[TO,n[ the set of all fi- 
nite paths from to to a predecessor of n, and Pg]to, n] the set of all finite 
paths from a successor of to to n. Program paths reaching the end node 

^ The construction of flow graphs is described in [Alll]. 

® This does not impose any restrictions as it is always possible to introduce a new 
start node and stop node enjoying these properties by need. 
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of G are called terminating. We assume that every node of a flow graph 
G lies on a terminating path starting in s. The operator “ ; ” denotes the 
concatenation of two paths. The length of a path p is given by the num- 
ber of its node occurrences and denoted by Ap. In particular, we denote the 
unique path of length 0 by e. For a path p and an index 1 < z < Ap, 
the z-th component of p is denoted by pi. A path q is a subpath of p, in 
signs g C p, if there is an index i G {1, . . . , Ap} such that z -I- Aq — 1 < Ap 
and Qj =pi+j-i for all j G {l,...,Ag}. Moreover, for z, j < Ap, p[z,j], 
p[i,j[, and p]z,j] denote the subpaths (jii,...,nj), (rii, . . . and 

(rii+i, . . . , rij) of p, respectively. If z > j, p[i,j] means e. The subpath rela- 
tion C defined on paths can naturally be extended to sets of paths P and Q: 
P G Q <^:^j^f\/p G P 3q G Q. p G q. Finally, for every path p = (ni, . . . , n\^) 
in G, we introduce the reversed path p of p defined by {n\^, . . . ,rzi). 

2.1.2 Provably Optimal Program Transformations 

Let G be a flow graph, T be a class of program transformations, and Tr be 
a transformation of T. Additionally, let Grr denote the flow graph result- 
ing from the application of Tr to G, and let Gr=df { G } U { Gtt \ Tr gT} 
denote the set of all programs resulting from a transformation of T extended 
by G itself. By means of these definitions, we can present our two-step ap- 
proach of optimal program optimization. 



Step 1: Fix a class of program transformations T and 
a relation <r C Gq- x Gr- 



Intuitively, the relation <t compares the “quality” of transformations 
Tr, Tr' G T. Usually, <r is a pre-order, and Gtt <t Gtt' can infor- 
mally be read as “ Tr is better than Tr' ^ . This directly induces a formal 
optimality criterion, which we call G<„- -optimality. 

Definition 2.1.1 (G<,^-Optimality). 

A transformation Tr G T is G<,^-optimal, if for all Tr' G T holds: 
Gtt <r Gtt'- 

Step 2: Fix a transformation Tropt G T and prove that 
it is -optimal. 



As mentioned before, the transformations Tr G T are typically defined in 
terms of a set of program properties T>. Every p G is a pair of functions 
(fN-ip ,X-ip ), whose domain and range are the sets of nodes of G and the 
Boolean truth values true and false of B, respectively, i.e., 

N-p ,X-p : N->B 
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Intuitively, the truth values N-tp (n) and X-(p (n), n G N, indicate whether 
ip holds at the entry and at the exit of the argument node n, respectively. 

It is worth noting that proving the -optimality of a given program 
transformation Trgpt G 'T does not rely on the particular choice of an al- 
gorithm A computing the program properties involved in the definition of 
Tropt- In fact, proving that an algorithm A computes a certain property of 
interest can be done separately. This is important because it structures and 
simplifies the overall proof by decomposing it into two independent steps. 

2.1.3 Provably Precise Data Flow Analyses 

In the context of our two-step scheme for optimal program optimization the 
task of DFA is to compute the program properties ip, which are involved in 
the transformation Tropt- Technically, this requires a static program analysis 
of G, which is performed by an appropriate DFA-algorithm computing the 
set of program points enjoying a program property ip of interest. This rises 
the questions of correctness and precision of DFA-algorithms. Intuitively, a 
DFA-algorithm is ip-precise, if it computes the set of nodes of G enjoying ip 
precisely, and it is ip- correct, if it approximates this set of nodes conserva- 
tively, i.e., if it computes a subset of the nodes of G enjoying ip.^ Once the 
DFA-algorithms have been proved precise for the program properties involved 
in Tropt, it is usually easy to perform the transformation itself, and the pro- 
gram resulting from the transformation is guaranteed to be 0<,p-optimal.® 
Theoretically well-founded are DFA-algorithms that are based on abstract 
interpretation, which has proved to be powerful and uniform framework for 
static program analyses (cf. [CCl, CCS, CC4, Ma, Niel, Nie2]). 



2.2 Abstract Interpretation 

The central idea of abstract interpretation is to replace the “full” semantics of 
a procedure by a simpler more abstract version, which is tailored for a specific 
problem.® Usually, an abstract interpretation consists of two components: a 
domain of relevant data flow information, and a local semantic functional 
which specifies the effect of elementary statements on the domain under con- 
sideration. Together both components induce two variants of a corresponding 
global abstract semantics of a flow graph, an operational one specifying the 
intuitively desired solution of a DFA-problem, and a denotational one induc- 
ing a computation procedure. 



^ yi-precision and y>-correctness are formally defined in Section 2.2.6. 

® 0-correctness is usually not sufficient to draw this conclusion (cf. Section 2.3). 
® Here, to compute a program property ip. 
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2.2.1 Data Flow Information 

The domain of data flow information is typically given by a complete semi- 
lattice^ 

(C,n,c,T,T) 

with least element _L and greatest element T . The elements of C are assumed 
to express the data flow information of interest. For practical applications 
lattices satisfying the descending chain condition are particularly important. 

Definition 2.2.1 (Descending Chain Condition). 

The lattice (C, □, C, T, T) satisfies the descending chain condition if and only 
if for every subset C' C C and every sequence of elements of C with 

Cl 3 C2 3 C3 □ C4 □ . . . 

there is an index ko such that Cj = Ckg for every j > ko. 

In the following C will always denote a complete semi-lattice. 

2.2.2 Local Abstract Semantics 

The local abstract semantics of a flow graph G is given by a semantic func- 
tional 

I J :7V^(c^C) 

which gives meaning to every node n € N in terms of a transformation on 
C. Without loss of generality, we assume that s and e are associated with 
the identity on C denoted by Idc ■ Note that a local abstract semantics | ] 
can easily be extended to cover finite paths. For every path p G Pcim^n], 
we define: 

|[ I _ / if p = e 

II PJ d/ I |p[2, Ap] ] o |pi ] otherwise 

2.2.3 Global Abstract Semantics 

As mentioned before, the global abstract semantics of G results from one of 
the following two globalization approaches of a local abstract semantics: the 
“operational” meet over all paths (MOP) approach, and the “denotational” 
maximal fixed point {MFP) approach in the sense of Kam and Ullman [KU2] . 
The solutions of these approaches define the specifying and the algorithmic 
solution of a DFA-problem, respectively. The MQP-approach directly mimics 
possible program executions: it “meets” (intersects) all informations, which 
belong to a program path reaching the program point under consideration. 

A complete semi-lattice is a complete lattice as well. DFA-algorithms, however, 
usually consider only the join operation or the meet operation of C. We emphasize 
this by considering C a semi-lattice. 
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Definition 2.2.2 (The MOP- Solution). 

Given a flow graph G={N,E,s,e) , a complete semi-lattice C, and a local 
abstract semantics | ], the MQP-solution is defined by:^ 

Vc, GCWnGN. MQP(j (N-MOP^f j,,^)(n), X-MOP([ j,,,)(n) ) 

where 

N-MOP(i j_^^)(n)=d/ n { |p](c^) be PG[s,n[} 

X-MQP(j j_^^)(n)=d/ n { |p](c^) be Pc[s,n] } 

This definition directly refiects our desires. However, it does not specify an 
effective computation procedure in general.® In contrast, the MPP-approach 
iteratively approximates the greatest solution of a system of equations which 
express consistency between pre-conditions and post-conditions expressed in 
terms of C with respect to a start information Cs € C: 

Equation System 2.2.3. 

, , _ J Cg if n = s 

pre(^nj FI { post(m) | m € predG(’T^) } otherwise 

post(n) = |n](pre(n)) 



Denoting the greatest solution of Equation System 2.2.3 with respect to a 
given start information Cg by and post,,^ , respectively, the solution of 

the MFP-approach is defined by: 

Definition 2.2.4 (The MPP-Solution). 

Given a flow graph G = {N, E,s,e) , a complete semi-lattice C and a local 
abstract semantics | ], the MPP-solution is defined by: 

ycsGCyneN. MFP(||_c,)(n) =d/ ( A^-MPP(| |_c,)(n), X-MFP(| |_c,)(n) ) 

where 

N-MFP^l J,c,)(n)=d/ pre^b”) X-MFP(j j_^^)(n)=d/ post^b’^) 

In general, this definition leads to a suboptimal but algorithmic description. 
Thus, there are two global notions of semantics here: an operational one, 
which precisely mimics our intuition, and a denotational one, which has an 
algorithmic character and induces a computation procedure. In fact, we thus 
consider the MQP-approach as a mean for the direct specification of a DFA, 
and the MEP-approach as its algorithmic realization.^® This rises the ques- 
tions of MOP -correctness and MOP -precision of such algorithms, which have 
elegantly be answered by Kildall, and Kam and Ullman. 

® Remember, “N” and “X” stand for entry and exit of a node, respectively. 

® Think e.g. of loops in a program. 

An explicit generic algorithm is given in Section 2.2.5. 
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2.2.4 AiQP- Correct ness and AfQP-Precision 

The key for answering the questions of MQP-correctness and MQP-precision 
are the following two notions on functions on a complete semi-lattice (C, □, C , 
_L,T). A function f : C^C on C is called 

— monotonic if and only if \f c, c' € C . c Q d implies /(c) C f{d) 

— distributive if and only if V0 yf C" C C. /(flC') = H {/(c) | c G C'} 

We recall that distributivity is a stronger requirement than monotonicity in 
the following sense: 

Lemma 2.2.1. 

A function f : C^C is monotonic iff VC' C C. /(HC') C H {/(c) | c G 
C'}. 

As demonstrated by Kam and Ullman, monotonicity of the semantic functions 
is sufficient to guarantee correctness of the MFP-solution with respect to the 
MQP-solution. We have (cf. [KU2]): 

Theorem 2.2.1 (Correctness Theorem). 

Given a flow graph G= {N, E,s,e) , the MFP -solution is a correct approx- 
imation of the MOP-solution, i.e., 

Vcs G C Vn G iV. MFP(| |,c,)(?t-) E MQP(| |,ca('^) 

if all the semantic functions |n], n€ N, are monotonic. 

Distributivity of the semantic functions yields precision. This follows from 
the well-known intraprocedural Coincidence Theorem 2.2.2 of Kildall [Kil], 
and Kam and Ullman [KU2]: 

Theorem 2.2.2 (Coincidence Theorem). 

Given a flow graph G={N,E,s,e) , the MFP-solution is precise for the 
MOP-solution, i.e., 

Vcs G C Vn G W MFP(| |_c,)(n) = MQP(| ],c,)(n) 
if all the semantic functions |n], n G N, are distributive. 



2.2.5 The Generic Fixed Point Algorithm 

In contrast to the MOP- approach, the MFP-approach is practically rele- 
vant because it directly specifies an iterative procedure for computing the 
MFP-solution. In this section we present a generic computation procedure, 
which is parameterized in the argument flow graph, the lattice of data flow 
information, the local semantic functional, and a start information. 
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Algorithm 2.2.5 (Computing the MFP-Solution) . 

Input: A flow graph G= {N, E,s,e), a complete semi-lattice C, a local se- 
mantic functional | ] : N —>■ (C ->-C), and a start information Cs € C. 

Output: An annotation of G with data flow informations, i.e., an annotation 
with pre-informations (stored in pre ) and post-informations (stored in post ) 
of elements of C, which represent valid data flow information at the entry 
and exit of every node of G, respectively. 

Remark: The variable workset controls the iterative process. Its elements are 
pairs, whose first components are nodes of G, and whose second components 
are elements of C , which specify a new approximation for the pre-information 
of the node of the first component. 

( Initialization of the annotation arrays pre and post, and the variable workset) 
FORALL n G A DO 

pre[n] := T ; 
post[n] := |n](T) 

OD; 

workset := { (s, c*) } U { {n, post[m]) \ n G succG(jn) A post[m] C T }; 

( Iterative fixed point computation) 

WHILE workset yf 0 DO 
LET (to, c) G workset 

BEGIN 

workset := workset\{ (to, c) }; 
meet := pre[m] □ c; 

IF pre[m] □ meet 

THEN 

pre[m] := meet; 
post[m] := |TO](pre[TO]); 

workset := workset U { {n, post[m\) \ n G smccg(to)} 

FI 

END 

OD. 

Denoting the values of workset, pre [n], and post[n] after the fc-th execution 
of the while-loop by workset’^, pre^[n], and post^[n], respectively, one can 
easily prove the following monotonicity property of Algorithm 2.2.5: 

Lemma 2.2.2. If the semantic functions |n], n G N, are monotonic, we 
have: 



VuGAV/cGTV. {pre^^^[n\,post^^^[n \ ) G {pre^[n\, post^[n \ ) 
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Moreover, by means of Lemma 2.2.2 we can prove: 

Theorem 2.2.3 (Algorithm 2.2.5). 

If the semantic functions |n], nG N, are monotonic, we have: 

1. If C satisfies the descending chain condition, there is a kg G IN with 

y k > ko- {pre ^ [n ] , post^ [n ] ) = {pre [n] , post^° [n ] ) 

2. Vn e A. MFP(| = (ri{pre^[n] \ k > 0}, \~\{post’^[n] | fc > 0} ) 



As an immediate corollary of Theorem 2.2.3 we obtain: 

Corollary 2.2.1 (Algorithm 2.2.5). 

If the semantic functions |n], nG N, are monotonic, we have: 

1. Algorithm 2.2.5 terminates, if C satisfies the descending chain condition. 

2. After the termination of Algorithm 2.2.5 holds: 

Vn G N. MFP(| |_c,)(n) = (pre[n],post[n] ) 



2.2.6 Formal Specification of DFA- Algorithms 

Following the presentation of the previous sections, a DFA A is specified by 
a triple (C, | ], Cg), which consists of a lattice C, a local semantic functional 
I ], and a start information Cg. The specification of a DFA A can directly be 
fed into the generic Algorithm 2.2.5, which yields the DFA-algorithm induced 
by A. 

Definition 2.2.6 (Specification of a DFA- Algorithm). 

The specification of a DFA A is a triple (C, | ],Cg), where 

1. C = (C, n, C, _L, T) is a complete semi-lattice, 

2. I ] : A^(C^C) a local semantic functional, and 

3. Cs G C a start information. 

The DFA-algorithm Alg{A) induced by A results from instantiating the 
generic Algorithm 2.2.5 with (C, | ],Cg). The MOP-solution of A and the 
MFP -solution of Alg{A) are the specifying and the algorithmic solutions of 
A, respectively. 

A DFA A expresses the information of interest in terms of lattice elements. 
In general, there is thus a gap between a DFA A and a program property :p, 
which is a Boolean predicate. This gap is closed by means of an interpretation 
function Int, which interprets the data flow information computed by A in 
Boolean truth values leading to the central notions of (/^-correctness and tp- 
precision of a DFA. 
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Definition 2.2.7 ((^-Correctness and (/^-Precision of a DFA). 

Let ip be a program property, A=(C, | ],Cg) a DFA, and Int : C^B an 
interpretation of C in B. Then A is 

1. (/3-correct if and only if (i) Int o N -MOP (j 

(a) Into X -MOP Ai-if 

2. (^-precise if and only if (i) Into N -MOP 

(a) Into X-MOPf^l-^^c,) X-Lp 



The notions of (^-correctness and (^9-precision relate the specifying solution of 
a DFA A to a property (p. In order to close the gap between the algorithmic 
solution of A and (p, we introduce next the notions of MQP-correctness and 
MQP-precision, which relate the algorithmic solution to the specifying solu- 
tion of A. Additionally, we introduce the notion of a terminating DFA. The 
separation of concerns resulting from these definitions simplifies the proofs 
of (/^-correctness and (/9-precision of a DFA significantly because proving the 
algorithmic solution of a DFA to be terminating, and correct or precise for 
<p reduces to checking the preconditions of the Correctness Theorem 2.2.1 or 
the Coincidence Theorem 2.2.2, respectively. This is particularly beneficial in 
the interprocedural case. 

Definition 2.2.8 (MQP-Correctness, MQP-Precision, Termination). 

A DFA A= (C, I ], Cs) is 

1. MOP-correct if and only if MFP(^ I C MOP(j |_c„) 

2. MQP-precise if and only if MFP(| | =MQP(| |_cs) 

3. terminating, if its induced DFA-algorithm Alg{A) terminates. 

MQP-correctness, MQP-precision, and the termination of a DFA can usually 
be proved straightforward by a few substeps. This is a consequence of Theo- 
rem 2.2.4, which results from combining the Correctness Theorem 2.2.1, the 
Coincidence Theorem 2.2.2, and Corollary 2.2.1, and gives sufficient condi- 
tions guaranteeing these properties of a DFA. 

Theorem 2.2.4 (MQP-Correctness, MQP-Precision, Termination). 

A DFA A={C,l ], Cs) is 

1. MOP-correct, if all semantic functions |n], n€N, are monotonic. 

2. MOP -precise, if all semantic functions |n], n G N, are distributive. 

3. terminating, if (i) C satisfies the descending chain condition, and 

(ii) all semantic functions |n], uGN, are monotonic. 

For convenience, we finally introduce an abbreviation, which both expresses 
the termination of a DFA A with respect to a given program property (p and 
its (/9-correctness ((/9-precision). 
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Definition 2.2.9 (Correctness and Precision of a DFA). 

Let ip be a program property and A he a DFA. Then A is called correct 
(precise) for ip if and only if A is (i) terminating and (ii) p-correct {p- 
precise). 



2.2.7 Forward, Backward, and Bidirectional DFA- Algorithms 

An important classification of DFA-algorithms, which we did not consider yet, 
is induced by the direction of information flow. Typically, DFA-algorithms 
are grouped into forward, backward, and bidirectional algorithms depend- 
ing on the direction of information flow (cf. [He]). A DFA-algorithm is 
called forward, if information is propagated in the same direction as con- 
trol flow, it is called backward, if information is propagated in the oppo- 
site direction of control flow, and it is called bidirectional, if information is 
mutually dependently propagated in both directions. As usual, we formu- 
lated our framework for forward analyses. However, backward analyses can 
be dealt with by forward analyses in our framework simply after inverting 
the flow of control. In contrast, bidirectional analyses cause more problems 
because they lack a natural operational (or MOP-) interpretation. Informa- 
tion flow paths, which have been introduced by Dhamdhere and Khedker 
in order to characterize the information flow of bidirectional analyses do 
usually not correspond to possible program executions (cf. [DKl, DK2]). 
Bidirectional problems are in fact conceptually and computationally more 
complex than unidirectional ones in general: in contrast to the unidirectional 
case, where reducible programs can be dealt with in 0(nlog(n)) time (cf. 
[AU, GW, HUl, HU2, Ke2, KUl, Tal, Ta2, Ta3, Ullj), the best known es- 
timation for bidirectional analyses is 0{ifl) (cf. [Dh3, DKl, DRZ, DP]), 
where n characterizes the size of the argument program (e.g., number of 
statements).^^ An elegant way to overcome this problem is to decompose 
bidirectional DFA-algorithms into sequences of unidirectional ones. Chapter 
3 shows the result of such a decomposition: the originally bidirectional flow of 
DFA-information for code motion (cf. [Ch, Dhl, Dh2, Dh3, DSl, MRl, So]) 
is structured into a sequence of a backward analysis followed by a forward 
analysis [KRSl, KRS2], a decomposition, which was first proposed by Steffen 
(cf. [Stl]). Besides yielding clarity, this decomposition was also the key to 
open the algorithm for modifications. By enhancing it by two further unidi- 
rectional analyses we arrived at a code motion algorithm which was the first 



In [DKl] the complexity of bidirectional problems has been estimated by 0(n*w), 
where w denotes the width of a flow graph. In contrast to the well-known notion 
of depth (cf. [He]) traditional estimations are based on, width is not a structural 
property of a flow graph, but varies with the problem under consideration. In 
particular, it is larger for bidirectional problems than for unidirectional ones, 
and in the worst case it is linear in the size of the flow graph. 
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one to achieve (intraprocedurally) computationally and lifetime optimal pro- 
grams [KRSl, KRS2]. Intuitively, computational optimality means that the 
number of computations on each program path cannot be reduced any fur- 
ther by means of semantics preserving code motion; lifetime optimality means 
that no computation has been moved unnecessarily far, i.e., without run-time 
gain. The flexibility resulting from the decomposition was also demonstrated 
by Drechsler and Stadel, who proposed a variant of the computationally and 
lifetime optimal algorithm, which inserts computations on edges rather than 
in nodes (cf. [DS2]). 



2.3 A Cookbook for Optimal Intraprocedural Program 
Optimization 

In this section we summarize the revision of the intraprocedural framework 
for optimal program optimization for constructing a program optimization 
from the designer’s point of view. The point is to arrive at a presentation, 
which supports the designer of a program optimization by structuring the 
construction process, and hiding all details of the framework that are ir- 
relevant for its application. In fact, following the lines of this section the 
construction and the proof of optimality of a program transformation as well 
as the proofs of precision of the corresponding DFA-algorithms can be done 
in a cookbook style. 

2.3.1 Optimal Program Optimization 

Fixing the Program Transformations and the Optimality Criterion. 

According to the two-step scheme of our approach to optimal program opti- 
mization, we first have to fix the class of program transformations and the 
optimality criterion of interest. Following Section 2.1.2, this requires: 



Define . . . 

1 . a set of appropriate program properties <P 

2. the class of program transformations T of interest in 
terms of a subset <Pc C <P 

3. a relation <t C Gt x Gr,^^ which induces the opti- 
mality criterion of interest 



The optimality criterion induced by <t is the criterion of -optimality 
in the sense of Definition 2.1.1, i.e.: 

A transformation Tr G T is 0<^-optimal, if for all Tr' gT holds: 

Gxr <T Gtt’ 

In general, <r will be a pre-order. 
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Fixing the Optimal Program Transformation. Next, the (optimal) pro- 
gram transformation must be defined. Similar to the class of program trans- 
formations T, it is defined in terms of a subset of the properties of i.e.: 

Define . . . 

4. the program transformation Tvopt of interest in terms 
of a subset Q ‘I* 



Subsequently, "we have to prove that Tropt is a member of the transformation 
class under consideration and satisfies the optimality criterion of interest. 
Thus: 

Prove . . . 

5. Tropt G T 

6. Tropt is -optimal 



2.3.2 Precise Data Flow Analysis 

After proving the optimality of Tropt, we have to define for each property 
ip G <Pt involved in the definition of Tropt a DFA Ay, computing the set of 
program points enjoying (p. Without loss of generality, we thus consider an 
arbitrary, but fixed property (p of in the following. 

Specifying the DFA Ay. According to Section 2.2.6 the specification of 
the DFA Ay,, and the proof that it is precise for (p requires the following 
components: 



Specify . . . 

7. a complete semi-lattice (C,n,C,_L,T) 

8. a local semantic functional | ] : N ^{C^C) 

9. a start information Cs € C 

10. an interpretation Int : C ^ B 



The lattice C represents the data flow information of interest, the local se- 
mantic functional gives meaning to the elementary statements of the argu- 
ment program, and the start information Cs represents the data flow infor- 
mation which is assumed to be valid immediately before the execution of the 
argument program starts. The function Int, finally, interprets the elements 
of C as Boolean truth values, which closes the gap between the data fiow 
information computed and the program property ip of interest. 

Handling Backward Analyses: We recall that backward analyses can be han- 
dled by forward analyses simply after inverting the fiow of control. 
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Proving Precision of At this stage, we have to verify that A^p is 
precise for ip in the sense of Definition 2.2.9, i.e., that A<^ is terminating 
and (/^-precise. Applying Definition 2.2.7 and Theorem 2.2.4 the following 
proof steps are sufficient: 

Prove . . . 

11. the lattice C satisfies the descending chain condition 

12. the local semantic functions |n], n G N, are distribu- 
tive 

13. the specifying solution of Ay, is (^-precise, i.e.: 

(i) Into N-MOP(iicp) N-(p 

(ii) IntoX-MOF(i 



Combining Definition 2.2.7, Theorem 2.2.4, and the propositions of the steps 
11, 12, and 13 we directly obtain the desired precision result. 

Theorem 2.3.1 (A,^-Precision). 

Ay, is precise for (p, i.e., Ay, is terminating and p-precise. 

After proving Theorem 2.3.1 for each DFA Ay,, p G I>t, we obtain that the 
transformation Tropt and the transformation Tr \ y,^<py.y induced by the 
algorithmic solutions of the DFA-algorithms Ay,, p € <I>t, coincide. Thus, 
we have the desired optimality result: 

Theorem 2.3.2 (O- Optimality). 

The transformation I is optimal. 

Monotonic DFA-Problems: Additional Proof Obligations. In con- 
trast to distributive DFA-problems, which are characterized by the fact that 
the local semantic functions are distributive, monotonic DFA-problems, i.e., 
problems, where the local semantic functions are monotonic (but not dis- 
tributive), impose additional proof obligations. A prominent representative 
of this class of DFA-problems is the problem of computing the set of simple 
constants of a program (cf. [Kil, Ki2, RLl, RL2]). Intuitively, a program 
term t is a simple constant, if it is a program constant, or if all its subterms 
are simple constants. The value c of a simple constant t can be computed at 
compile-time. The original and possibly complex program term t can then 
be replaced by its value c in order to improve the run-time efficiency of the 
argument program. 

For a monotonic DFA-problem, i.e., if step 12 holds for monotonicity 
instead of distributivity only, the Correctness Theorem 2.2.1 still yields 
the MQP-correctness of Ay,. Together with step 13, this even implies p- 
correctness of Ay,. In general, however, this is not sufficient in order to guar- 
antee that the program i resulting from the transformation 
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is correct or even 0<^-optimal. Analogously, this also holds 
if step 13 holds for (^-correctness instead of (^-precision only. In both cases 
the following two proof obligations must additionally be verified in order to 
guarantee correctness and profitability of the induced transformation: 



Prove . . . 

14. I G T 

15. Ctti , I ea I ^ 



Application: Intraprocedural Code Motion. We conclude this section 
with an outlook to the code motion application, which we consider in Chapter 
3 for illustrating the usage of the cookbook. For this application the set of 
program properties <P is basically given by the set of predicates safe, correct, 
down-safe, earliest, latest, and isolated. The class of program transformations 
T is given by the set of admissible code motion transformations which are 
defined in terms of the predicates safe and correct. The predicates down-safe 
and earliest are used for defining the computationally optimal transformation 
of busy code motion, and the predicates latest and isolated for defining the 
computationally and lifetime optimal transformation of lazy code motion. 





3. Optimal Intraprocedural Code Motion: The 
Transformations 



In this chapter we consider a practically relevant optimization, the elimina- 
tion of partially redundant computations in a procedure, in order to illustrate 
the two-step approach of our framework for optimal intraprocedural program 
optimization. In particular, we demonstrate how to apply the cookbook of 
Section 2.3. To this end we recall the busy and the lazy code motion transfor- 
mation of [KRSl, KRS2] for partial redundancy elimination. They result in 
computationally and lifetime optimal programs, respectively. We remark that 
lifetime optimality implies computational optimality. The optimality theo- 
rems applying to the busy and the lazy code motion transformation were 
originally proved in [KRSl, KRS2]. Here, they are corollaries of their inter- 
procedural counterparts proved in Chapter 10. We thus omit proofs in this 
chapter, which allows us to focus on illustrating the two-step approach. In 
the following we denote the transformations of busy and lazy code motion 
more briefly as RCM-transformation and LCM-transformation. The variants 
of the original transformations of [KRSl, KRS2] presented in the following 
are slightly enhanced in order to work on parallel assignments. 

After introducing the basic definitions, which are necessary for defin- 
ing the RCM-transformation and the LCM-transformation in Section 3.1, 
we present the RCM-transformation and the theorems showing its com- 
putational optimality in Section 3.3. Subsequently, we present the LCM- 
transformation together with the theorems demonstrating its lifetime opti- 
mality in Section 3.4. In Section 3.5, Anally, we demonstrate the power of 
both transformations by means of an illustrating example, which highlights 
their essential features. 



3.1 Preliminaries 

We develop the RCM-transformation and the LCM-transformation with re- 
spect to an arbitrary but fixed flow graph G= (N, E,s,e), and an arbitrary 
but fixed program term t G T. This allows us a simple and unparameter- 
ized notation. As usual, we assume that terms are inductively composed of 
variables, operators, and constants. 

Parallel Assignments, Modifications, and Computations. We as- 
sume that the nodes of G represent parallel assignments of the form 



J. Knoop: Optimal Interprocedural Program Optimization, LNCS 1428, pp. 31-48, 1998. 
© Springer- Verlag Berlin Heidelberg 1998 
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{x\, . . . ,Xk) '■= where i ^ j implies Xi ^ Xj. A parallel as- 

signment is called a modification of t, if it assigns to an operand of t. It is 
called a computation of t, if t is a subexpression of one of its right-hand side 
terms. Moreover, the occurrences of t occurring in computations of G are 
called original computations. 

Local Predicates Comp and Transp . Central for defining the BCM- 
transformation and the LCM-transformation are two local predicates Transp 
and Comp, which are defined for every node n € N. Intuitively, these predi- 
cates indicate whether t is modified or computed by the assignment of node 
n.^ 

— Transp (n): n is transparent for t, i.e., n is not a modification of t. 

— Comp (n): n is a computation of t, i.e., n contains an original computation 
of t. 

Convention: As in [KRS2], we extend a predicate Predicate, which is defined 
on nodes n, to paths p by means of the following convention: 

— Predicate^ {p) VI < f < Ap. Predicate{pi) 

— Predicate^ (p) 3 1 < f < Ap. Predicate{pi) 

Note that the formulas ^ Predicate'^ {p) and ^ Predicate^ {p) are abbrevi- 
ations of the formulas 31 < i < Ap. ^Predicate{pi) and VI < i < 
Ap. Predicate (pi) according to this convention. 

Critical Edges. In this section we recall the well-known fact that in com- 
pletely arbitrary graph structures code motion may be blocked by critical 
edges, i.e., by edges leading from nodes with more than one successor to 
nodes with more than one predecessor (cf. [Dhl, Dh3, DRZ, DSl, KRSl, 
KRS2, KRS3, RWZ, SKRl, SKR2]). In order to exploit the full power of 
code motion, critical edges must be removed in the argument flow graph as 
it is illustrated below. 

In Figure 3.1(a) the computation of “a -I- 6” at node 3 is partially redun- 
dant with respect to the computation of “a -I- 6” at node 1. However, this 
partial redundancy cannot safely be eliminated by moving the computation 
of “a -|- 6” to its preceding nodes because this may introduce a new com- 
putation on a path leaving node 2 on the right branch. On the other hand, 
it can safely be eliminated after inserting a synthetic node 4 in the critical 
edge (2,3), as illustrated in Figure 3.1(b). 

In the following we assume that all edges in G leading to a node with more 
than one predecessor have been split by inserting a synthetic node. Besides 
guaranteeing that all critical edges have been split, this simple transformation 
simplifies the process of code motion because computationally and lifetime 
optimal programs can now be obtained by moving all computations to node 

^ In [MRl] the predicate Comp is called Antloc, which stands for “local 
anticipability” . 
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a) 



1 






b) 



\ 





Fig. 3.1. Critical edges 



entries (cf. [KRSl]). In essence, this is a consequence of the Control Flow 
Lemma 3.1.1, which characterizes the branching structure of G after the 
insertion of synthetic nodes (cf. [KRS2]). 

Lemma 3.1.1 (Control Flow Lemma). 

1. \/nG N. Ipredcin) | > 2 succa{preda{n)) = {n} 

2. Vn € fV. I succc{n) | > 2 predG{succa{n)) = {n} 



3.2 Intraprocedural Code Motion Transformations 

In accordance to the first step of our two-step approach for program opti- 
mization, we fix in this section the set of program transformations of interest, 
the set of admissible code motion transformations. In essence, code motion 
transformations are characterized by the following three-step procedure: (1) 
Declare a new temporary h in the argument flow graph for storing the value 
of the computation under consideration, (2) insert assignments of the form 
h := t at some nodes of G, and (3) replace some of the original computations 
of t in G by h. 

The first step of declaring the temporary is shared by all code motion 
transformations. Thus, the specification of a code motion transformation CM 
reduces to defining two predicates Insert cm and Replace which denote 
the set of program points where an initialization must be inserted and an 
original computation must be replaced. Without loss of generality we assume 
that ReplacccM implies Gomp , and that the conjunction of Insert cm and 
Comp implies ReplacccM- Note that this does not impose any restrictions to 
our approach. It only avoids transformations which keep an original compu- 
tation even after an insertion into the corresponding node making the original 
computation locally redundant. 
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• Declare a new temporary h for storing the value of t in 
the argument flow graph. 

• Insert at the entry of every node satisfying Insert cm the 
assignment h:=t. 

• Replace every original computation of t in nodes satis- 
fying ReplacecM h. 

Table 3.1: Scheme of intraprocedural code motion transformations 

In the following we denote the set of all code motion transformations with 
respect to t, i.e., the set of all transformations matching the scheme of Ta- 
ble 3.1, by CA4. In the following section, we will restrict CA4 to the set 
of admissible transformations, which are guaranteed to preserve the seman- 
tics of their argument programs. We consider this a necessary constraint of 
optimizing program transformations. 

3.2.1 Admissible Transformations 

A code motion transformation CM is admissible, if it preserves the seman- 
tics of its argument programs. Intuitively, this requires that CM is safe and 
correct. “Safe” means that there is no program path on which the computa- 
tion of a new value is introduced by inserting a computation of t; “correct” 
means that h always represents the same value as t when it is replaced by 
h. Formally, two computations of t represent the same value on a path if and 
only if no operand is modified between them. This is reflected in the following 
definition, which defines when inserting and replacing a computation of t is 
safe and correct in a node n G N. 

Definition 3.2.1 (Safety and Correctness). 

For all nodes n G N we define: 

1. Safety: Safe{n) ^pG Pc[s,e] Vj. (pj=n) 

a) Bi<j. Comp (pi) A Transp^ {p[i, j[) V 
3 f > j. Comp (pi) A Transp'^ {p[j , z[) 

2. Strong Safety: S-Safe{n) -^^df N-USafe(n) V N-DSafe{n), where 

a) N-USafe{n) -^^df 

WpG PG[s,n] 3i < Xp. Compipi) A Transp'^ {p[i,Xp[) 

b) N-DSafe{n) -^^df 

VpGPG[zz,e] 3i<\p. Compipi) A Transp'^ {p[l,i[) 

3. Correctness: Let CM gCM. Then: 

Correctcmin) -^^df 

Vp G PG[s,n] 3i < Xp. Insertcmipi) A Transp'^ {p[i,Xp[) 

In the intraprocedural setting, there is no difference between safety and strong 
safety. The backward implication is quite obvious from the definitions of 





3.2 Intraprocedural Code Motion Transformations 



35 



safety and strong safety. The converse implication is essentially a consequence 
of the fact that we are only considering nondeterministic branching, which 
makes the set of all paths in G a regular set. Each path in Pg[s, n] leading 
to a node n can be completed by every program continuation starting in n, 
i.e., by every path in Pa[n, e]. In particular, each path violating the up-safety 
condition at a node n, can be linked to every path violating the down-safety 
condition at this node. This yields a path violating the safety condition at 
n, and proves the contrapositive of the forward implication. As we will see 
in Chapter 10, this is an important difference to the interprocedural setting. 
Though we consider in the interprocedural setting nondeterministic branch- 
ing as well this equivalence gets lost. Essentially, this is a consequence of 
the fact that the set of interprocedurally valid program paths is given by a 
context-free set (cf. Chapter 7). 

Lemma 3.2.1 (Safety Lemma). 

VuGN. Safe{n) S-Safe{n) 

The predicates for safety and correctness allow us to define the class of ad- 
missible code motion transformations. 

Definition 3.2.2 (Admissible Code Motion). 

A code motion transformation CM G CM is admissible if and only if every 
node n G N satisfies the following two conditions: 

1. Insert CM (n) Safe{n) 

2. Replace cm{'^) CorrectcM{n) 

The set of all admissible code motion transformations is denoted by CMAdm- 



We have (cf. [KRS2]): 

Lemma 3.2.2 (Correctness Lemma). 

V CM G CMAdm yn G N. CorrectcM{n) Safe{n) 

Note that admissibility of a code motion transformation does not require 
that insertions are really used. Condition (2) of Definition 3.2.2 holds triv- 
ially if the predicate Replace is false for every node. Thus, we additionally 
introduce the subset of canonic transformations. Intuitively, an admissible 
code motion transformation is canonic, if insertions are used on every pro- 
gram continuation. Canonic transformations are particularly important in 
the interprocedural setting. 
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Definition 3.2.3 (Canonic Code Motion). 

An admissible code motion transformation CM G CAiAdm is canonic if and 
only if for every node n G N the following condition holds: 

Insert CM {n) ^ 

Vp G Pc[n,e] 31 < i < Ap. Replace Qj^^{pi) A ^/nsert|.^(p[2, z] ) 

We denote the set of all canonic code motion transformations by CAAcan- 
Programs resulting from a transformation of CAAcan are called canonic, too. 



Having now fixed the set of program transformations of interest, we pro- 
ceed further along the lines of our two-step approach for optimal program 
optimization, and introduce next the optimality criteria of interest: compu- 
tational and lifetime optimality. 

3.2.2 Computationally Optimal Transformations 

The primary goal of code motion is to minimize the number of computations 
on every program path. This intent is reflected by the relation “computation- 
ally better” . It requires the local predicate Comp cm j which is true for nodes 
containing a computation of t after applying the code motion transformation 
it is annotated with, i.e.: 

Comp cM{'^)=df Insert CM (n) V {Comp (rz) A -^Replace cm 

Using this notation, a code motion transformation CM G CAAAdm is compu- 
tationally better than a code motion transformation CM G CAAAdm if and 
only if 

y pG Pg[s,6]. I {z I Comp cm(p*)} I < I {* I Gomp cm'(k)} I 

Note that the relation “computationally better” is reflexive. Thus, computa- 
tionally at least as good would be the more precise but uglier term. Nonethe- 
less, by means of this relation, we can now define: 

Definition 3.2.4 (Computationally Optimal Code Motion). 

An admissible code motion transformation CM G CAAAdm is computation- 
ally optimal if and only if it is computationally better than any other admis- 
sible code motion transformation. The set of all computationally optimal code 
motion transformations is denoted by CAAcmpOpt- 

Intraprocedurally, computationally optimal code motion transformations are 
canonic. In fact, we have: 

Theorem 3.2.1 (Computational Optimality and Canonicity). 



CAA 

CmpOpt ^ CAA 

Can 
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As we will demonstrate in Chapter 10 this theorem does not carry over to 
the interprocedural setting, which reveals an important and essential differ- 
ence between intraprocedural and interprocedural code motion. Moreover, 
canonicity will be shown of being the key for successfully enhancing the in- 
traprocedural code motion techniques interprocedurally. 



3.2.3 Lifetime Optimal Transformations 

Besides the primary goal of code motion of avoiding unnecessary recompu- 
tations of values, its secondary goal is to avoid unnecessarily far motions 
of computations because they can cause superfluous register pressure. This 
requires to minimize the lifetimes of temporaries, which are introduced by 
a computationally optimal code motion transformation. Intuitively, a com- 
putationally optimal code motion transformation is lifetime optimal, if the 
lifetime ranges of temporaries are minimal. In essence, a lifetime range is a 
path from an initialization site to a use site of h which is free of redefinitions 
of h. 

Definition 3.2.5 (Lifetime Ranges). 

Let CM G CMAdm- The set o/ lifetime ranges of CM is defined by 
LtRg{CM)=df {p \InsertcM{pi) A i?epZacepM(PAp) A ^/nsert|;M(p]l, Ap] )} 

This leads us to the following definition: a code motion transformation CM G 
CMAdm is lifetime better than a code motion transformation CM' G CMAdm 
if and only if 

Wp€ LtRg{CM)3q€ LtRg{CM').p C q 

Analogously to the notion of computationally optimal transformations, we 
can now define the notion of lifetime optimal transformations. 

Definition 3.2.6 (Lifetime Optimal Code Motion). 

A computationally optimal code motion transformation CM G CMcmpOpt is 
lifetime optimal if and only if it is lifetime better than any other computa- 
tionally optimal code motion transformation. The set of all lifetime optimal 
code motion transformations is denoted by CMuOpt- 

Intuitively, lifetime optimality guarantees that no computation has been 
moved without run-time gain. Thus, there is no superfluous register pressure 
due to unnecessary code motion. In contrast to computational optimality, 
however, which can be achieved by several admissible code motion transfor- 
mations, lifetime optimality is achieved at most by one transformation as 
shown by Theorem 3.2.2 (cf. [KRS2]). 
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Theorem 3.2.2 (Uniqueness of Lifetime Optimal Code Motion). 

I CMLtOpt I < 1 

We conclude this section recalling the notion of first-use-lifetime ranges. They 
are important for the optimality proofs of the iJCM-transformation and the 
L CM -transformation . 

Definition 3.2.7 (First-Use-Lifetime Ranges). 

Let CM GCAiAdm- We define 

FU-LtRg{CM)=df{p€ LtRg{CM) | Vg G LtRg{CM). qCp^ q = p} 
We have (cf. [KRS2]): 

Lemma 3.2.3 (First-Use-Lifetime Range Lemma). 

Let CM G CM.Adm, P G Pc[s,e], and qi,q 2 G FU-LtRg{CM) with qi Q p 
and q 2 E p. Then either 

- qi=q 2 or 

— qi and q 2 are disjoint, i.e., they do not have any node occurrence in com- 
mon. 

We now continue to follow the lines of Section 2.3. We define two program 
transformations, the busy and the lazy code motion transformation, which 
will be shown to satisfy the optimality criteria introduced above. They are 
computationally and lifetime optimal, respectively. 



3.3 The Transformation 

In this section we recall the definition of the BCM-transformation. It is based 
on the predicates for down-safety and earliestness. 

3.3.1 Specification 

Intuitively, a node n is down-safe at its entry, if on every terminating pro- 
gram path starting in n there is a computation of t which is not preceded 
by a modification of t. Analogously, it is down-safe at its exit, if on every 
terminating path starting in a successor of n there is a computation of t 
which is not preceded by a modification of t. 

Definition 3.3.1 (Down-Safety). 

Let n G N. n is 

1. entry-down-safe [in signs: N-DSafe(ji)] -^^df 
VpGPc[?T- 7 e] 3i<Xp. Comp (pi) A Transp'^ (p[l,i[) 
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2. exit-down-safe [ in signs: X-DSafe (n) ] 

VpGPG]?t-,e] 3i<Xp. Comp (pi) A Transp'^ (p[l,i[) 

Intuitively, a computation is earliest at a node n, if an “earlier” computation 
would not deliver the same value due to a subsequent modification or would 
not be down-safe. In other words, a computation is earliest if there is a path 
from s to n, where no node prior to n is down-safe and delivers the same 
value as n when computing t. 

Definition 3.3.2 (Earliestness). 

Let n G N. n is 

1. entry-earliest [in signs: N- Earliest (ji)] -^^df 

3p G Pc[s,n[ Vi < Xp. N-DSafe (pi)^^ Transp'^ {p[i, Ap] ) 

2. exit-earliest [ in signs: X-Earliest (n) ] -^^df 

3p G Pc[s,n] Vi < Xp. N-DSafe(j>i)^^Transp'^{p[i,Xp\) 

Abbreviating N-DSafe and N-Earliest by ESafe and Earliest, respec- 
tively, Table 3.2 shows the definition of the predicates Insert bcm and 
ReplacegQj^, which specify the iJCM-transformation. 



• y n G N. Insert bcm {n)=df ESafe (ji) A Earliest{n) 

• Vn G N. ReplaceBCM{'^)=df Comp(ji) 

Table 3.2: The BCM-transformation 

Intuitively, the iJCM-transformation places the computations as early as 
possible in a program while maintaining safety. 

3.3.2 Proving Computational Optimality 

The iJCM-transformation yields computationally optimal programs. Central 
for proving this result is the following lemma (cf. [KRS2]). 

Lemma 3.3.1 (BCM-Lemma). 

1. yn G N. Insert bcm (n) <1=^ 

Safe{n) A H ^{Transp{m) A Safe{m)) 

m^predQ (n) 

2. VnGfV. CorrectBCM{n) <1=^ Safe{n) 

3. y p G Pg[s, e] Vi < Xp. Insert bcm {Pi) 3 j > i.p[i,j] G FU-LtRg(BCM) 

4 . V CM G CM-Adm y P G LtRg(BCM). ^Replace Qi^{Xp) V Insert^j^fp) 
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By means of the BCM-Lemma 3.3.1, we obtain as desired (cf. [KRS2]): 

Theorem 3.3.1 (BCM-Optimality). 

The BCM -transformation is computationally optimal, i.e., BCM GCMcmpOpt- 



3.4 The LC'M-Transformation 

In this section we recall the definition of the LCM-transformation. It is based 
on the predicates for latestness and isolation. 

3.4.1 Specification 

In order to avoid any code motion without run-time gain, and therefore any 
superfluous register pressure, computations must be placed as late as possible 
in a program while maintaining computational optimality. Intuitively, this 
requires to move the insertions of the BCM-transformation in the direction 
of the control flow to “later” program points. This leads us to the notion of 
delayability. 

Definition 3.4.1 (Delayability). 

Let n G N. n is 

1. entry-delayable [in signs: N -Delay able {n)] -^^df 

Vp e Pc[s,n] 3i < Xp. InsertBCM{Pi) A -^Comp^{p[i,Xp\). 

2. exit-delayable [ in signs: X-Delayable (n) ] -^^df 

Vp G Pc[s,n] 3i < Xp. InsertBCM{Pi) A ->Comp^{p[i,Xp]). 

The following definition characterizes the set of program points, where an 
insertion of the BCM-transformation is “maximally delayed” . 

Definition 3.4.2 (Latestness). 

A node n G N is latest, if it satisfies the predicate Latest defined by 
Latest {n)=df N-Delayable {n) A {Comp{n) V ^ N- Delay able {m) ) 

m^succQ (n) 

Intuitively, a node n is latest, if an insertion of t can be delayed to its en- 
try (i.e., N-Delayable (n)), and if its further delay is blocked by an original 
computation of t (i.e., Comp{n)) as it does not make sense to initialize 
after a use site, or if the process of moving the insertions of the BCM- 
transformation to later program points fails for some of the successors of 
n (i.e., ^ f\ N-Delayable (m)). 

m^succG{Ti) 

In order to avoid insertions, whose value can only be used in the insertion 
node itself, we define a predicate identifying “unusable” insertion points, i.e., 
program points which are lacking a terminating program continuation with 
a computation of t which is not preceded by a redefinition of h. 




Definition 3.4.3 (Unusability). 

Let n G N. n is 
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1. entry-unusable [in signs: N- Unusable (ji)] 

Vp G Pcin, e] Vz < Ap. Compipi) => Latest^ {p[l,i]). 

2. exit-unusable [in signs: X- Unusable (n)] 

Vp G Pg]zz, e] Vz < Ap. Compipi) => Latest^ {p[l,i]). 

Computations, which are exit-unusable, can only be used in the node itself. 
We call them thus “isolated”. 

Definition 3.4.4 (Isolation). 

A node n G N is isolated, if it satisfies the predicate Isolated defined by 
Isolated {n)=df X- Unusable (rz) 

Table 3.3 shows the definition of the predicates Insert lcm and Replaccj^Qj^f, 
which specify the LCM-transformation. 



• y n G N. Insert lcm {n)=df Latest (n) A ^Isolated{n) 

• ynG N. Replace LCM{'^)=df Comp{n) A Latest {n) A Isolated (n)) 



Table 3.3: The LCM-transformation 

As mentioned earlier, the point of the LCM-transformation is to place com- 
putations as late as possible in a program while maintaining computational 
optimality. This is the main result of the following section. 



3.4.2 Proving Lifetime Optimality 

The LCM-transformation yields lifetime optimal programs. The following 
lemma is the key for proving this result (cf. [KRS2]). 

Lemma 3.4.1 (LCM-Lemma). 

1. WnG N. N-Delayable{n) => N-DSafe{n) 

2. Vp G Pc[s,e] yi <\p. N -Delay able {ni) 

3z < ? < J. p[z,j] G FU-LtRg (BCM) 

3. VpG FU-LtRg(BCM)). Latest^{p) 

4- y P G LtRg(BCM)) Vz < Ap. Latest {pfi ^N-Delayable^ (p]i, Xp\) 

5. V CM G CMcmpOpt y n G N. Comp N-Delayable (n) 

6. Vp G Pc[s,e] (Vz < Ap. (p*) G LtRg{LCM)) 

3p' G Pc[s,e]. p[l,z] =p'[l,z] A 3j>z. p'[z,j] G LtRg(LCM) 
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By means of the LCM-Lemma 3.4.1 we can prove (cf. [KRS2]): 

Theorem 3.4.1 (The L CM -Optimality) . 

The L CM -transformation is computationally and lifetime optimal, i.e., LCM 
G CMnopt- 

As a corollary of Theorem 3.2.2 and Theorem 3.4.1 we immediately obtain: 
Corollary 3.4.1. CM uopt = { LCM } 



3.5 An Illustrating Example 

In this section we demonstrate the power of the BCM-transformation and 
the LCM-transformation by discussing the motivating example of [KRSl] 
displayed in Figure 3.2. The program fragment of this example is complex 
enough in order to illustrate the essential features of the two transformations. 
In order to keep the presentation as simple as possible, synthetic nodes that 
do not occur as insertion points of the RCM-transformation and the LCM- 
transformation are omitted. 

Figure 3.3 shows the result of computing the set of down-safe and of 
earliest program points. They induce the insertion points of the BCM- 
transformation. The result of this transformation is displayed in Figure 3.4. 
Note that the flow graph of this figure is indeed computationally optimal. 

Subsequently, Figure 3.5 and Figure 3.6 show the results of computing 
the sets of delayable and latest, and of latest and isolated program points, 
respectively. They induce the insertion and replacement points of the LCM- 
transformation. The result of this transformation is shown in Figure 3.7. It 
is exceptional, because it eliminates the partially redundant computations of 
“a-|-6” at node 10 and node 16 by moving them to node 8 and node 15, but 
it does not touch the computations of “a -I- 6” in node 3 and node 17 which 
cannot be moved with run-time gain. For the example under consideration, 
this confirms that computations are only moved by the LCM-transformation, 
when it is profitable. Note that the flow graph of Figure 3.7 is in fact com- 
putationally and lifetime optimal. 

We recall that the algorithm for lazy code motion of [KRSl, KRS2] was the 
first algorithm satisfying these optimality criteria. 




4. Optimal Intraprocedural Code Motion: The 
DFA- Algorithms 



In this chapter we specify for every program property involved in the 
iJCM-transformation and the LCM-transformation a corresponding DFA- 
algorithm. The specifications follow the cookbook style of Section 2.3. Thus, 
every DFA-algorithm is defined by a lattice of data flow information, a local 
semantic functional, a start information, and an interpretation of the lattice 
elements in the set of Boolean truth values. All DFA-algorithms are precise 
for the program property they are designed for. The corresponding preci- 
sion theorems are corollaries of their interprocedural counterparts proved in 
Chapter 11. In spite of this fact, we still present the proofs of the intraproce- 
dural theorems in full detail. First, in order to demonstrate the analogies and 
differences to the interprocedural setting, and second, in order to also demon- 
strate the similarity of precision proofs for different properties. According to 
the recipe of Section 2.3 this reduces for every DFA-algorithm to proving that 
the lattice satisfies the descending chain condition, that the local semantic 
functions are distributive, and that the meet over all paths solution is precise 
for the program property under consideration. In contrast to [KRSl, KRS2] 
the central step of the precision proofs is based on the Coincidence Theorem 
2.2.2. For convenience, we identify throughout this chapter the specification 
of a DFA and the DFA-algorithm it induces (cf. Definition 2.2.6). 



4.1 DFA- Algorithm A-ds- Down-Safety 

In this section we present the DFA-algorithm Ads for computing the set of 
down-safe program points.^ The main result applying to this algorithm is the 
Ads-Precision Theorem 4.1.2. It guarantees that the algorithm is precise for 
this program property: it terminates with the set of all program points being 
down-safe in the sense of Definition 3.3.1. 

We recall that down-safety requires a backward analysis of the program 
under consideration. This is reflected in the definition of the local semantic 
functional (cf. Section 4.1.1), and in the fact that the start information is 
attached to the end node (cf. Section 4.1.1). 



^ The index ds stands for down-safety. 



J. Knoop: Optimal Interprocedural Program Optimization, LNCS 1428, pp. 49-67, 1998. 
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4.1.1 Specification 

Data Flow Information. The domain of data flow information of Ads is 
given by the lattice of Boolean truth values 

(C, n, C, -L,T) =df (B, A, <, false, true ) 

with false < true. Intuitively, the data flow information attached to a pro- 
gram point expresses, whether a placement of t is down-safe at this point. 

Local Semantic Functional. The local semantic functional [ ^ 

{B — > B) of Ads is deflned by 

WnGN'ibGB. ln]^^{b)=df Comp{n) V (5 A Transp{n)) 

Intuitively, a placement of t is down-safe at the entry of a node n, if it is 
computed in n (i.e.. Comp (n)), or if it is down-safe at its exit (i.e., b= true) 
and none of its operands is modified by n (i.e., Transp(n)). 

Start Information. The start information of Ads is given by the element 

false G B 

Intuitively, this choice of the start information expresses that t cannot be 
used after the termination of the argument program. 

Interpretation. The interpretation of lattice elements in B is given by the 
identity on B. Thus, the function Intds ■ B ^ B is deflned by 

Intds=df Ids 



4.1.2 Proving Precision 

Important for proving the precision of Ads is the fact that each of the local 
semantic functions n G N, is either the constant function Const true 

or Const false, or the identity Ids on B. This follows immediately from the 
definition of the local semantic functions. 

Lemma 4.1.1. 

{ Const true if Comp (n) 

Mb if ^ Comp (n) A Transp (n) 

Constfaise if ~^{Comp{n) V Transp (n)) 

Note that constant functions and the identity on a lattice are distributive. 
This directly implies that the local semantic functions of the down-safety 
analysis are distributive, too. 

Lemma 4.1.2. The functions Consttme, Constfaise, and Ids are distribu- 
tive. 




4.1 DFA- Algorithm Ada’- Down-Safety 
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The following variant of Lemma 4.1.1 will be convenient for proving the ds- 
Precision Theorem 4.1.1. 

Lemma 4.1.3. 

1. ynGNybGB. Comp (n) \n ]^^(6) = true 

2. y n & N \/h & B. ^ Comp (n) 

( I n = trwe 4=^ b=true A Transp{n)) 

Descending Chain Condition. The lattice of Boolean truth values is fi- 
nite. Consequently, every descending chain is finite as well. Thus, we have: 

Lemma 4.1.4 (Descending Chain Condition). 

The lattice {B, f\ false, tfue) satisfies the descending chain condition. 

Distributivity. The distributivity of the local semantic functions |n], n€ 
N, follows immediately from Lemma 4.1.1 and Lemma 4.1.2. 

Lemma 4.1.5 (| ] ^^-Distributivity). 

The local semantic functions n€ N, are distributive. 

ds-Precision. The last step required for verifying the precision of Aids is 
to prove that Aids is ds-predse. This means that down-safety in the sense 
of Definition 3.3.1 coincides with the meet over all paths solution of Aids as 
expressed by Theorem 4.1.1. Without loss of generality, we will only prove 
the first part of this theorem. It is the relevant one for defining the BCM- 
transformation, and the second part can be proved in the same fashion. 

Theorem 4.1.1 (ds-Precision). 

For all nodes n G N we have: 

1. N-DSafe{n) if and only if Intds{X-MOP(^ii^^jaise){ri)) 

2. X-DSafe {n) if and only if Intds{N-MOP(^i j^^jaise) {n)) 

Proof. 

As mentioned above, we concentrate on the first part of Theorem 4.1.1. In 
order to simplify the notation, we abbreviate Intds ° A-MQP(| |^^,/aise) by 
X-MOP throughout the proof. 

The first implication, “=J>”, 

WnGN. N-DSafe{n) X-MOP{n) 

is proved by showing the even stronger implication 

y p G PG[n,e]. (3 1 < f < Ap. Comp{pi) A Transp'^ {p[l,i[) 

Iplds(Me) = ^’’«e) 

Implication (4.1) will simultaneously be proved for all nodes n G 
induction on the length k of all paths p G Pg [n, e] satisfying 



(4.1) 
N by 
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31 < i < Xp = k. Comp{pi) A Transp"^ (p[l,i[) (4-2) 

Obviously, the case k = 0 does not occur. Thus, let p be a path of length 
k = l. In this case (4.2) yields 



Comp (pi) 

Applying Lemma 4. 1.3(1) we therefore obtain as desired 
IPlds(Me) = |pi ]^,{false) = true 



In order to prove the induction step, let fc > 1, and assume that (4.1) holds 
for all paths q with Xq < k, i.e., 

(IH) (Vg G PG[u,e]. 1 < Aq < fc). 

(3 1 < z < Xq. Comp{qi) A Transp'^ {q[l,i\) I 9 Ids 

Now, it is sufficient to show that for every path p G Pg [n, e] with Xp = k 
satisfying (4.2) holds 

IPlds(Me) = trzze 

Without loss of generality, we can assume that there is such a path p, which 
then can obviously be rewritten as p=(n);p' for some p' G PG[zzz,e] with 
TO G succa{n). Two cases must be investigated next. 

Case 1. Comp{n) 

In this case. Lemma 4. 1.3(1) yields as desired 

iZlds(Me) = I n ld^(|p' L^(Me)) = true 
Case 2. -^Comp{n) 

In this case, (4.2) guarantees the existence of an index z with 2 < i < k and 
Comp{pi) A Tronsp^(p[l, z[) 

Thus, the induction hypothesis (IH) can be applied yielding 

b'lds(Me) = trzze 

Applying Lemma 4. 1.3(2) we now obtain the following sequence of equalities 
Ip Ids (Me) = I n ]rf^([p' Ids (Me)) = [ n \^^{true) = true 
which completes the proof of the first implication. 

The second implication, “4=”, 



yn&N. X-MOP{n) => N-DSafe{n) 
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is proved by showing the even stronger implication 

VpePc[n.e|. 

( Ip Ids 3 1 < z < Xp. Comp (pi) A Transp (p[l, z[) ) 

Also this implication is simultaneously proved for all nodes n € N hy induc- 
tion on the length k of all paths p G Pg [n, e] satisfying 

IPlds(Me) = trzze (4.4) 

As in the proof of the first implication, the case fc = 0 does not occur, since 
we are dealing with entry-down-safety. Thus, we can start with considering 
a path p of length k = l. In this case, equation (4.4) yields 

[plds(Me) = \pi Id^(Me) = true 

By means of Lemma 4.1.3 we therefore obtain 

Comp (pi) 

Hence, the induction basis follows for z = 1. 

For the proof of the induction step, let fc > 1, and assume that (4.3) holds 
for all paths q with Ag < k, i.e., 

(IH) (Vg G PG[u,e]. 1 < Ag < fc). 

( I g l^^(/ofee) = trzze 3 1 < z < Ag. Comp{qi) A Transp'^ {q[l,i\)) 

It is sufficient to show that for each path p G Pg[?t ^7 e] with Xp = k satisfying 
equation (4.4) holds 

3 1 < z < Ap = fc. Comp (pi) A Transp^ {p[l, z[) 

Without loss of generality, we can assume that there is such a path p. Obvi- 
ously, this path can be rewritten as p= {ri);p' for some p' G Pg[uz, e] with 
TO G succoin). Similarly to the proof of the first implication two cases must 
be investigated next in order to complete the proof. 

Case 1. Comp{n) 

In this case the induction step follows trivially for z = 1 . 

Case 2. ^Comp{n) 

Applying Lemma 4. 1.3(2), we here obtain 

Transp (n) (4-5) 

and 

b'lds(Me) = trzze (4.6) 

Applying now the induction hypothesis (IH) to p' we obtain the existence of 
an index ipi with 1 < Zp< < fc and 

Comp{p\^^) A Transp~^ {p'[l,ip’\) 



(4.7) 
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Combining (4.5) and (4.7), the induction step follows for i = ipi. This com- 
pletes the proof of the second implication and finishes the proof of the relevant 
part of Theorem 4.1.1. □ 

Combining Lemma 4.3.4, Lemma 4.1.5, and Theorem 4.1.1, we obtain as 
desired that Ads is precise for the predicate down-safe. In particular, this 
guarantees that the MTP-solution computed by Ads coincides with the set 
of all program points being down-safe in the sense of Definition 3.3.1. 

Theorem 4.1.2 (.4ds-Precision). 

Ads is precise for down-safety, i.e., Ads is terminating and ds-precise. 

4.2 DFA-Algorithm A-ea- Earliestness 

In this section we present the DFA-algorithm Aea for computing the set of 
earliest program points.^ The main result of this section, the Aea-Precison 
Theorem 4.2.2, yields that it is indeed precise for this property: it terminates 
with the set of all program points being earliest in the sense of Definition 

3.3.2. 

4.2.1 Specification 

Data Flow Information. The domain of data flow information of Aea is 
given by the lattice of Boolean truth values 

(C, n, C, T, T) =df (5, V ,>, true, false ) 

Intuitively, the data flow information attached to a program point indicates, 
whether a placement of t is earliest at this point. In distinction to the down- 
safety analysis, however, the earliestness analysis uses the corresponding dual 
lattice. This is necessary because the generic algorithm of Section 2.2.5 is tai- 
lored for computing the greatest solution of an equation system, whereas the 
straightforward specification of the earliestness analysis requires the compu- 
tation of the least solution. 

Local Semantic Functional. The local semantic functional | La • 

{B B) of Aea is defined by^ 

WnGNybGB. ln]^^{b)=df ^Transp{n) V (6 A ^DSafe{n)) 

Intuitively, a placement of t is earliest at the exit of a node n, if t is modified 
by the assignment of node n (i.e., ^ Transp (n)), or if it is earliest at its entry 
(i.e., b = true), and a placement of t is not down-safe there (i.e., ^DSafe (n)). 

^ The index ea stands for earliestness. 

® Recall that DSafe=df N-DSafe (cf. Section 3.3.1). 
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Start Information. The start information of Aea is given by the element 

true G B 

Intuitively, a computation cannot be placed earlier than at the entry of the 
start node of a program. 

Interpretation. As for down-safety, also for earliestness the interpretation 
of the lattice elements of B is given by the identity on B, i.e., the function 
Intea ■ B^B is defined by 

Intea = df Mb 



4.2.2 Proving Precision 

Like the local semantic functions for down-safety, also the local semantic 
functions for earliestness | n ]ea> n G N, can be characterized in terms of the 
identity and the constant functions on B: 

Lemma 4.2.1. 

{ Consttrue if ^Transp{n) 

Mb if Transp {n) A -^DSafe {n) 

Const false if Transp (n) A DSafe (n) 

Similar to Lemma 4.1.1 this lemma can be rewritten as shown in Lemma 
4.2.2. This variant will be used in the proof of the Aea-Precision Theorem 

4.2.2. 

Lemma 4.2.2. 

1. \/n G N \/b G B. ^Transp{n) I^lea(^) = 

2. y n G N \/h G B. Transp (n) 

( I n ]g^(6) = trae 4=^ ^DSafe{n) A b=true) 

Descending Chain Condition. The finiteness of B guarantees: 

Lemma 4.2.3 (Descending Chain Condition). 

The lattice {B, V ,>, true, false) satisfies the descending chain condition. 

Distributivity. Combining Lemma 4.2.1 and Lemma 4.1.2 we obtain as 
desired the distributivity of the local semantic functions | rr n G N . 

Lemma 4.2.4 (| J^^-Distributivity). 

The local semantic functions |ri]g^, nG N, are distributive. 
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ea-Precision. Theorem 4.2.1 provides the final step of proving the precision 
of Aea- It yields the coincidence of earliestness in the sense of Definition 3.3.2 
and the meet over all paths solution of Aea- Like for Theorem 4.1.1 we only 
prove the first part of this theorem. It is the relevant one for defining the 
B CM -transformation. The second part can be proved analogously. 

Theorem 4.2.1 (ea-Precision). 

For all nodes n G N we have: 

1. N-Earliest{n) if and only if Intea{N -MOP(^^ 

2. X-Earliest{n) if and only if /ntea(-^--L/QP(| 

Proof. 

As mentioned before, we concentrate on the first part of Theorem 4.2.1, and 
in order to shorten the notation, we abbreviate Intea o N-MOP(i }^^,true) by 
N-MOP throughout the proof. 

The first implication, “=J>”, 

Vn G A. N-Earliest{n) N-MOP{n) 
is proved by showing the equivalent formula: 

Vp G Pc[s,n[. (VI < f < Ap. DSafe{pt)^^Transp'^{p[i,Xp])) 

^ \p}ea(^™^) = 

It is simultaneously proved for all nodes n G N hy induction on the length 
k of all paths p G Pc[s,n[ satisfying 

V 1 < i < Ap = fc. ESafe (pi) -^Transp^ {p[i, Ap]) (4-9) 

For k = 0, we obtain n = s and p = e. In this case the desired equality 

lpieaitrue) = le]^^{true)=true 

holds trivially. 

In order to prove the induction step, let A: > 0, and assume that (4.8) 
holds for all paths q with Xq < k, i.e., 

(IH) (Vg G Pc[s,n[. 0 < Ag < fc). 

( V 1 < z < Ag. ESafe (qi) ^Transp'^ {q[i, Ag]) ) {q ]g£((tz’zze) = true 

It is sufficient to show that for every path p G Pg[s, n[ with Xp = k satisfying 
(4.9) holds 

Without loss of generality we can assume that there is such a path p, which 
can then be rewritten as p = p';{m) for some p' G Pg[s,to[ with m G 
predain). This leaves us with the investigation of the following two cases. 
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Case 1. ^Transp{m) 

In this case, the desired sequence of equalities 

bLa(^’’we) = I m La(trMe)) = true 

is a direct consequence of Lemma 4.2. 2(1). 

Case 2. Transp{m) 

According to the choice of p we obtain ^DSafe(m), and therefore 
VI < i<Xp = k. DSafe (pi) ^ ^ Transp^ {p[i, k[) 

Thus, the induction hypothesis (IH) yields 

Ip' leai^rue) = true 

Combining this with Lemma 4. 2. 2(2) we obtain as desired 

Ipj^ai^rue) = I m L„(|p' hai^rue)) = | m L<,(trMe) = true 
which completes the proof of the first implication. 

The second implication, “4=”, 

V n G A^. N-MOP{n) N-Earliest (n) 

is equivalent to 

VpG Pg[s,u[. {lpj^^{true) = true) ^ 

( V 1 < i < Ap. DSafe {pi) ^ Transp^ (p[i, Ap]) ) 

This implication is now simultaneously proved for all nodes n € N hy in- 
duction on the length k of all paths p G Pc[s,n[ satisfying 

lpjea(^rue) = true (4-11) 

For fc = 0, we obtain p = e and the implication 

V 1 < i < 0. DSafe (pi) ^ Transp^ {p[i, 0]) 

holds trivially. 

In order to prove the induction step, let fc > 0, and assume that (4.10) 
holds for all paths q with Xq < k, i.e., 

(IH) (Vg G Pc[s,n[. 0 < Ag < fc). 

(I ^ (VI < t < Ag. DSafe (q^) ^ ^Transp'^ {q[i, A,]) ) 

It is sufficient to show that for every path p G Pg[s, n[ with Xp = k satisfying 
(4.11) holds 

V 1 < z < Ap = A:. DSafe (pi) ^Transp'^ {p[i, Ap]) 
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Hence, we can assume the existence of such a path p, which can obviously 
be rewritten as p = p';{m) for some p' G Pg[s,to[ with m G predoin). 
Similarly to the proof of the first implication two cases must be investigated 
next. 

Case 1. ^Transp{m) 

In this case the induction step holds trivially because Transp {p\^ ) does not 
hold. 

Case 2. Transp (m) 

In this case we obtain by means of (4. II) and Lemma 4. 2. 2(2) 

-^DSafe{m) A \p' \^g^{true) = true 

Applying now the induction hypothesis (IH) to p' yields as desired the in- 
duction step. This completes the proof of the second implication, and finishes 
the proof of the relevant part of Theorem 4.2.1. □ 

Combining Lemma 4.2.3, Lemma 4.2.4, and Theorem 4.1.1 we obtain the 
desired precision of Aea- In particular, this guarantees that the AfPP-solution 
computed by Aea coincides with the set of all program points being earliest 
in the sense of Definition 3.3.2. 

Theorem 4.2.2 (Aea-Precision). 

Aea is precise for earliestness, i.e., Aea is terminating and ea-precise. 



4.3 DFA-Algorithm -^dT- Delayability 

In this section we present the specification of the DFA-algorithm Adi for 
computing the set of delayable program points and prove it to be precise.^ 
In fact, this will be guaranteed by the main result of this section, the Adi~ 
Precision Theorem 4.3.2. It yields that Adi terminates with the set of all 
program points being delayable in the sense of Definition 3.4.1. 



4.3.1 Specification 

Data Flow Information. The domain of data flow information of Adi is 
given by the lattice of Boolean truth values 

(C, n, C, T,T) =df {B, A, <, false, true ) 

In the context of this analysis, a data flow information attached to a program 
point indicates, whether an insertion of the BCM-transformation can be 
delayed to the program point under consideration. 

^ The index dl stands for delayability. 
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Local Semantic Functional. The local semantic functional [ 1^/ • 

{B — > B) of Adi is defined by 

y n & N \/b & B. \n'\di{b)=df {h V Insert BCM{n)) A ^Comp{n) 

Intuitively, an insertion of the iJCM-transformation can be delayed to the 
exit of a node n, if the term t under consideration is not blocked by n 
(i.e., ^Comp (n)), and if the insertion can be delayed to the entry of n. 
This holds trivially, if n is an insertion point of the iJCM-transformation 
(i.e.. Insert BCM(n)), or, alternatively, if the argument of |n]^j is true (i.e., 
b = true). 

Start Information. The start information is given by the element 

Insert bcm{s) G B 

Intuitively, this choice of the start information reflects that the process of 
moving the insertions of the iJCM-transformation in the direction of the 
control flow to “later” program points starts in the insertion points of the 
iJCM-transformation. 

Interpretation. The interpretation of lattice elements in B is given by the 
identity on B. This means the function Intdi : B ^ B is defined by 

Intdi =df Ide 



4.3.2 Proving Precision 

Like for down-safety and earliestness, we first introduce a lemma character- 
izing the local semantic functions of the delayability analysis in terms of the 
constant functions and the identity on B. This lemma follows immediately 
from the definition of the local semantic functions. Moreover, we present two 
further lemmas, which are helpful for proving the Aldj-Precision Theorem 
4.3.2. The first one is a simple consequence of the definition of intraprocedu- 
ral delayability (cf. Definition 3.4.1), and the second one is a reformulation 
of Lemma 4.3.1. 

Lemma 4.3.1. 

{ Consttrue if ^Comp {n) A InsertBCM{n) 

Mb if -^{Compfn) V Insert BCAtin)) 

Const false if Compfn) 

Lemma 4.3.2. y n G N. N-Delayable{n) 4=^ InsertBcmin) V 

(y p y Pg[s, n[ 3 1 < z < Ap. Insert bcm{Pi) A ^Comp^{p[i, Ap]) 

Lemma 4.3.3. 

1. ynGNyb^B. Comp (n) | rz ]^;(6) = false 

2. y n & N yh & B. ^ Comp (n) 

(InJdiib) = true 4=^ Insert bcm(ji) V b=true) 
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Descending Chain Condition. The finiteness of B guarantees that every 
descending chain in B is finite. Thus, we have: 

Lemma 4.3.4 (Descending Chain Condition). 

The lattice {B, A ,<, false, true) satisfies the descending chain condition. 

Distributivity. Combining Lemma 4.3.1 and Lemma 4.1.2, we get as de- 
sired the distributivity of the local semantic functions of the delayability 
analysis. 

Lemma 4.3.5 (| ]^j-Distributivity). 

The local semantic functions nG N, are distributive. 

dLPrecision. The last step in proving the precision of Adi is to prove the 
coincidence of delayability in the sense of Definition 3.4.1 and the meet over 
all paths solution of Adi. This coincidence yields the desired (imprecision 
of Adi, which is granted by Theorem 4.3.1. For the same reasons as in the 
previous sections, we will only prove the first part of this theorem. 

Theorem 4.3.1 (dmPrecision). 

For all nodes n G N we have: 

1. N -Delay able {n) if and only if InsertBCMin) V 

Intdl{N -MOP l,j;,/nsertBCM(s))(^)) 

2. X-Delayable{n) if and only if Intdi{X-MOP^i j^^jnsertBCM{s)){n)) 

Proof. As mentioned above we concentrate on the first part of Theorem 
4.3.1. In order to simplify the notation, Intdi o N-MOP(^^ l^iJnsertBCMis)) 
abbreviated by N-MOP throughout the proof. 

According to Lemma 4.3.2 the first implication, “=J>”, 

WnGN. N-D elay able {n) ^ InsertBCMin) V N-MOP{n) 

is equivalent to 

\/n G N. {InsertBCMin) V 

Vp G Pc[s,n[. (31 < i < Ap. Insert bcm{Pi) A ^Comp^(p[z, Ap]) )(4.12) 
InsertBCMin) V |p ]^,(/nseriscM(s)) = trwe ) 

Obviously, (4.12) is trivial, if n satisfies the predicate Insert bcm- In order 
to complete the proof of (4.12), it is therefore sufficient to show 

Vp G Pg[s,u[. (31 < z < Ap. Insert BCMiPi) A ^Comp^(p[z, Ap]) ) 

^ |p]d/(/zzsertBCM(s)) = trzze^ 

This implication is simultaneously proved for all nodes n G N hy induction 
on the length k of all paths p G Pc[s,n[ satisfying 

3 1 < z < Ap = fc. InsertscMiPi) A ^Comp^ip[i, Xp\) (4.14) 
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Obviously, the case fc = 0 does not occur, and therefore, we can start with 
considering a path p of length k=l. In this case (4.14) delivers 

InsertBCMiPi) A ^Comp{pi) 

Hence, applying Lemma 4. 3. 3(2) we obtain as desired 

[pld/(^^se’'^sCM(s)) = Ipi Idii^nsertBcmis)) = true 

In order to prove the induction step, let fc > 1, and assume that (4.13) holds 
for all paths q with Xq<k, i.e., 

(IH) (Vg e Pg[s,u[. 1 < Ag < /c). 

(3 1 < z < Ag. InsertBCM{qi) A ^Comp^{q[i, A^]) ) 

^ lq\di{InsertBCM{s)) = true 

It is sufficient to show that for every path p G Pg[s, n[ with \ = k satisfying 
(4.14) holds 

[p\^i{InsertBCM{s)) = true 

Thus, without loss of generality we can assume that there is such a path p. 
This path can then be rewritten as p = p';{m) for some m G predcin) and 
p' G Pg[s,to[. Next, two cases must be investigated. 

Case 1. Insert BCMinr) 

According to (4.14), InsertBCM(jn) directly implies -^Comp{m). Hence, 
Lemma 4. 3. 3 (2) yields as desired 

[p\di{InsertBCM{s)) = true 

Case 2. ^Insert b cm i'rn) 

In this case, (4.14) guarantees the existence of an index 1 < z < fc satisfying 
Insert BCM{Pi) and ^Comp^{p[i,k]). Thus, by induction hypothesis (IH) we 
get 

[p' \di{InsertBCM{s)) = true 

Applying now Lemma 4. 3. 3(2) we get the following sequence of equations 

IpL/(dnsertsc'M(s)) = | m 1^,(1/ l^;(/nsertscM(s))) = | m \dMrue) = true 

which completes the proof of the first implication. 

The second implication, “4=”, 

WnGN. Insert BCM{n) V N-MOP{n) N -Delay able {n) 

holds trivially by means of Lemma 4.3.2, if n satisfies the predicate Insert bcm ■ 
Thus, in order to complete the proof of the second implication it is sufficient 
to show 
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VpGPG[s,n[. {lp]^i{InsertBCM{s)) = true) 

3 (4.15) 

3 1 < i < Ap. Insert BCM{Pi) A -^Comp {p[i,Xp]) 

This implication can simultaneously be proved for all nodes n € N hy in- 
duction on the length k of all paths p G Pc[s,n[ satisfying 

lpjj^i{InsertBCM{s)) = true (4-16) 

As in the proof of the first implication, the case fc = 0 does not occur. There- 
fore, we can start with considering a path p of length k=l. In this case, 
(4.16) yields immediately 

IPld/(^^se’'^BCM(s)) = |pi j^iilnsertBcmis)) = true 
By means of Lemma 4. 3. 3(2) we thus obtain 

InsertBCMiPi) A ^Comp{pi) 



which proves the induction basis. 

Let now A: > 1, and assume that (4.15) holds for all paths q with Xq<k, 
i.e., 

(IH) (Vg G Pc[s,n[. 1 < Ag < /c). 

ilqjdii^nsertBCM{s)) = true) ^ 

31 < i < Xq. Insert BCM^qi) A ^Comp^{q[i, Xq]) 

It is sufficient to show that for every path p G Pg[s, n[ with Xp = k satisfying 
(4.16) holds 

31 < z < Xp = k. Insert BCMipi) A ^Comp^{p[i,k]) 

Without loss of generality, we can assume that there is such a path p, which 
can be rewritten as p = p';(m) for some m G predc{n) and p' G PG[s,m[. 
Moreover, Lemma 4. 3. 3 (2) yields directly ^Comp (m). Similar to the proof 
of the first implication we are now left with investigating two cases. 

Case 1. Insert BCM{rn) 

In this case the induction step follows for i = k. 

Case 2. -^Insert BCMim) 

In this case we obtain by means of Lemma 4. 3. 3(2) and the choice of p 
|p' \di{InsertBCM{s)) = true 

Applying the induction hypothesis (IH) to p', we get the existence of an 
index Zp/ with 1 < ipi < k and 

Insert BCM{Pip,) A ^Comp^{p[ip',k[) 



(4.17) 
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Combining (4.17) and the fact that m does not satisfy Comp , the induction 
step follows for i = ipi in this case. This completes the proof of the second 
implication and finishes the proof of the relevant part of Theorem 4.3.1. □ 

Applying Lemma 4.3.4, Lemma 4.3.5, and Theorem 4.3.1 we obtain the de- 
sired precision of Adi ■ In particular, we obtain that the MTP-solution com- 
puted by Adi coincides with the set of all program points being delayable in 
the sense of Definition 3.4.1. 

Theorem 4.3.2 (.Tdi-Precision). 

Adi is precise for delay ability, i.e., Adi is terminating and dl-precise. 



4.4 DFA-Algorithm A^n- Unusability 

In this section we specify the DFA-algorithm Aun for computing the set 
of unusable insertion points and prove it to be precise.® In fact, this is a 
consequence of the A«„-Precision Theorem 4.4.2, which is the main result 
of this section. It yields that Aun terminates with the set of program points 
being unusable in the sense of Definition 3.4.3. 



4.4.1 Specification 

Data Flow Information. The domain of data flow information of Aun is 
given by the lattice of Boolean truth values 

(C, n, C, T,T) =df {B, A, <, false, true ) 

where a data flow information attached to a program point indicates, whether 
a placement of t is unusable at this point. 

Local Semantic Functional. The local semantic functional | : N 

{B B) of Aun is defined by 

\/n G N yb G B. ln}^^{b)=df Latest (n) V {-^Comp{n) A b) 

Intuitively, a placement of t is unusable at the entry of a node n, if it is 
latest at the entry of n (i.e.. Latest (n)), or if it is not computed in n (i.e., 
-^Comp{n)) and unusable at its exit (i.e., b=true). Note that unusability 
requires like down-safety a backward analysis of the argument program, a 
fact, which is reflected in the definition of the local semantic functions. For 
the same reason, the start information of the unusability analysis is associated 
with the end node. 



® The index un stands for unusability. 
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Start Information. The start information is given by the element 

true G B 

Intuitively, a computation cannot be used after the termination of the pro- 
gram under consideration. This is reflected in choosing true as start infor- 
mation. 

Interpretation. The interpretation of the lattice elements of B is given by 
the identity on B, i.e., the function Intun ■ B—^B is defined by 

I’^tun — df Ids 



4.4.2 Proving Precision 

Before proving the precision of Aum we present a lemma, which characterizes 
the local semantic functions in terms of the constant functions and the iden- 
tity on B. This lemma, which follows from the definition of the local seman- 
tic functional | ]„„ of Aun, is the key for proving the | ]„„-Distributivity 
Lemma 4.4.4. 

Lemma 4.4.1. 

{ Const true if Latest (n) 

Mb if -^{Latest (n) V Comp{n)) 

Const false if -^Latest (n) A Compfn) 

Moreover, we present an alternative version of Lemma 4.4.1, which is more 
convenient for proving the un-Precision Theorem 4.4.1. 

Lemma 4.4.2. 

1. \/n G N \/b G B. Latest {n) I 

2. y n G N \/h G B. ^Latest (n) 

( I n 1„„(6) = tree 4=^ -^Comp(ji) A b=true) 

Descending Chain Condition. Obviously, the flniteness of B implies: 

Lemma 4.4.3 (Descending Chain Condition). 

The lattice {B, /\ ,<, false, true) satisfies the descending chain condition. 

Distributivity. By means of Lemma 4.4.1 and Lemma 4.1.2 we obtain as 
desired: 

Lemma 4.4.4 (| ]^^-Distributivity). 

The local semantic functions n G N, are distributive. 
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itn-Precision. The last step in proving the precision of Aun is to prove 
that unusability in the sense of Definition 3.4.3 and the meet over all paths 
solution of Aun coincide. We obtain this coincidence in the same fashion as in 
the previous sections. It guarantees that Aun is ttn-precise as it is expressed 
by Theorem 4.4.1. For the same reasons as in the previous sections, we will 
prove this time the second part of this theorem only. 

Theorem 4.4.1 (un-Precision). 

For all nodes n G N we have: 

1. N-Unusahle{n) if and only if 

2. X-Unusahle{n) if and only if Intun{N 

Proof. 

As mentioned before we will only prove the second part of Theorem 4.4.1. In 
order to simplify the notation, we abbreviate Intun ° 1V-MQP(| by 

N-MOP throughout the proof. 

The first implication, “=J>”, 

y n G N. X-Unusahle{n) N-MOP{n) 

is proved by showing the even stronger implication 

Vp e Pc]n,e]. (VI < z < Ap. 

Comp {pf)^ Latest^ {p[l,i])) ^ [p L„(^?'we) = 

This implication is now simultaneously proved for all nodes n G N hy in- 
duction on the length k of all paths p G PG]rz,e] satisfying 

V 1 < z < Ap = fc. Comp (pi) ^ Latest^ {p[l,i]) (4.19) 

For k = 0, we obtain p = e and 

lPiunitrue) = le]^^{true) = true 

holds trivially. 

In order to prove the induction step, let k > 0, and assume that (4.18) 
holds for all paths q with Xq < k, i.e., 

(IH) (Vg G Pc]zz,e]. 0 < Aq < fc). 

(VI < i < Xq. Comp (qi) Latest^{q[l, z]) ) | g ]„„(tz’zze) = true 

It is sufficient to show that for each path p G Pg]?t ^7 e] with Xp = k satisfying 
(4.19) holds 

IpL„(^’'«e) = trzze 

Hence, we can assume that there is such a path p, which can then be rewritten 
as p= (m);p' for some p' G PG]zzz,e] with m G succciji). Now the following 
two cases must be investigated. 




66 



4. Intraprocedural Code Motion 



Case 1. Latest {m) 

In this case, Lemma 4. 4. 2(1) yields as desired 

Ip = I TO ]„„(^rMe)) = true 

Case 2. -^Latest {m) 

In this case, (4.19) delivers ^Comp {m). Moreover, it guarantees 
Vz > 2. Comp{pi) Latest^ {p[2 , i]) 

Thus, applying the induction hypothesis (IH) to p' yields 

IP'Ln(^™e) = trzze 

Due to ^Latest {m) and ^Comp {m), we now obtain by means of Lemma 
4.4.2(2) 

iPlnni^rue) = I TO L„(b' L„(irzze)) = | to ]„„(trzze) = true 

This completes the proof of the first implication. 

The second implication, “4=”, 

Wn € N. N-MOP{n) ^ X-Unusable{n) 

is proved by showing the even stronger implication 

VpGPG]n,e]. (|p]„„(trae) = tTOe) ^ 

(Vl<z<Ap. Comp (pi)=> Latest^ (p[l,i])) ' ’ 

It will simultaneously be proved for all nodes n G IV by induction on the 
length k of all paths p G Pc]?T^,e] satisfying 

IpL„(^’’«e) = trzze (4.21) 

As in the proof of the first implication, the case k = 0 holds trivially. 

Thus, let A: > 0, and assume that (4.20) holds for all paths q with Xq < k, 

i.e., 

(IH) (Vg G Pc]?T^,e]. 0 < Aq < fc). 

{{qj^rii'^rue) = true) (Vl<z< Ag. Comp (qt) ^ Latest^ {q[l,i])) 

It is sufficient to show that for every path p G Pg]?t^, e] with Xp = k satisfying 
(4.21) holds 

V 1 < z < Ap = fc. Comp (pi) Latest^ {p[l, z]) 

We can therefore assume that there is such a path p, which then can be 
rewritten as p=(z7z);p' for some p' G PgJtojG] with m G succoiri). Similar 
to the proof of the first implication we are left with investigating two cases. 
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Case 1. Latest {m) 

In this case the induction step follows trivially because we have Latest {p\ ) . 
Case 2. -^Latest {m) 

Here we obtain by means of (4.21) and Lemma 4. 4. 2(2) 

-^Comp (m) A {p' j^^{true) = true (4.22) 

Applying the induction hypothesis (IH) to p' then yields the induction step 
as desired. This completes the proof of the second implication, and finishes 
the proof of the relevant part of Theorem 4.4.1. □ 

By means of Lemma 4.4.3, Lemma 4.4.4, and Theorem 4.4.1, we get the 
desired precision of Aun- In particular, this implies that the MTP-solution 
computed by Aun coincides with the set of all program points, where a 
computation of t would be unusable in the sense of Definition 3.4.3. 

Theorem 4.4.2 (.4„„-Precision). 

Aun is precise for unusability, i.e., Aun is terminating and un-precise. 




5. The Programming Language 



In this chapter we introduce the programming language, which we consider 
during the development and application of our framework for optimal in- 
terprocedural program optimization. The syntactic features of this language, 
called Prog, represent the common of Algol- like programming languages (e.g., 
Algol, Pascal, Modula, etc.), for which our framework is primarily designed. 



5.1 Programs 

The programming language Prog is an Algol-like imperative programming 
language. Pro^-programs II have statically nested procedures, global and 
local variables, formal value, reference, and procedure parameters. They are 
represented as structured systems 77 = ( tti , . . . , ) of (mutually recursive) 

procedure definitions, where every procedure tt S 77 has a list of formal value 
parameters fi, . . . , fq, g > 0, a list of formal reference parameters Z\, ... ,Zr, 
r > 0, a list of formal procedure parameters 4>\, . . . , 4>s, s > 0, and a list of 
local variables v\, . . . ,Vu, u > 0. Additionally, there may be occurrences of 
external variables and external procedures in a Prog-program. 

5.1.1 Syntax 

The syntax of Prog-programs is basically given by the contextfree-like pro- 
duction rules given below. Angle brackets are used for expressing the static 
nesting of procedures. The non-terminal stmt -part stands for the statement 
part of a procedure. Its elementary components are parallel assignments and 
procedure calls as indicated by the last three rules. 

The procedure generated by the rule for main-proc is considered the 
main procedure {main program) of the corresponding Prog-program. It does 
not have formal parameters, and cannot be called by other procedures. For 
clarity, we will sometimes add mode information to formal procedure pa- 
rameters using the key words val, ref, and proc indicating a value, reference, 
and procedure parameter, respectively. For example, in the declaration of the 
procedure tt below 

TT { : : (j> {val, val : ref : proc{ : ref : proc))) 
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the mode information attached to the parameter (j) identifies it as a formal 
procedure having two value, one reference, and one procedure parameter, 
which itself has a reference and a parameterless procedure parameter. 



prograon ::= 


(main-proc : 


block) 






main-proc : 


= 7T 








block ::= var-part stmt- 


part 1 var-part (proc-part) 


stmt-part 


var-part ::= 


= empty | m, . 


• , : 






proc-part : 


= procedure 


: block 


procedure : block, 


proc-part 


stmt-part : 


= “statement part” 






procedure : 
empty ::= 


b 

II 


: zi,.... 


Zr ■ j ■ ■ • j ) 




elem-stmt : 


= ass-stm 1 call-stm 






ass-stm ::= 


{xi,. ..,x„) := 


- (fi) ■ • ■ ) 


tn) 




call-stm ::= 


= call 7r(ti, . . . 


tq . Xi , . 


. , Xr . , . . . , tg ) 





Internal and External Identifiers. We split the set of identifiers occur- 
ring in a Pro^program U into the sets of internal and external identifiers. 
An identifier is called internal, if it has a defining occurrence inside 77, and 
external otherwise. Similarly, a procedure call in 77 is called internal, if the 
identifier denoting the called procedure is an internal identifier. Otherwise, 
it is called external. Additionally, we distinguish ordinary and formal proce- 
dure calls. A procedure call in 77 is called formal, if the defining occurrence 
of the identifier denoting the called procedure occurs in the formal param- 
eter list of a procedure of 77, and it is called ordinary, otherwise. Finally, 
Ext{n) denotes the set of all external identifiers of 77. It is composed of the 
sets of external variable and procedure identifiers denoted by Ext-v(n) and 
Extp{n), respectively. 

Identifier Bindings. We assume that Prog obeys the block structuring and 
identifier binding and visibility rules of Algol-like programming languages 
with static scoping. Accordingly, every procedure of a program 77, including 
its main procedure, encloses the set of its static successors . The local vari- 
ables of a procedure are therefore global variables of all its static successors 
and can be accessed by them. In particular, external identifiers are considered 
global identifiers of all procedures of 77. Thus, external variables and exter- 
nal procedures can be accessed and called by every procedure like a global 
variable and an ordinary procedure, respectively. Internal procedures except 
for the main procedure can be called by (other) internal procedures according 
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to the visibility rules of procedure identifiers. In particular, procedures of the 
same static level, which have the same static predecessor, can call each other 
mutually recursively . 

Distinguished Programs and Correct Procedure Calls. We call a 
Pro^-program U distinguished, if the sets of internal and external identi- 
fiers are disjoint and if defining occurrences in U use different identifiers. 
An ordinary internal procedure call of U is correct, if the numbers of value, 
reference, and procedure parameters of the call statement coincide with the 
corresponding numbers of the formal parameters of the procedure called. We 
remark that this definition can naturally be extended to formal procedure 
calls, if mode information is given. 

5.1.2 Notations 

In order to simplify the argumentation about programs and procedures, we 
introduce a number of further notations. 

Identifiers, Operators, and Terms. We denote the set of identifiers of 
Prog by Idf . It is composed of the disjoint sets of variable identifiers V and 
procedure identifiers P, which itself consists of the disjoint sets of ordinary 
procedure identifiers OP and formal procedure identifiers FP. Usually, we 
omit the term “identifier” , and simply speak of variables and procedures. 
Identifiers range preferably over lower case Greek letters as e.g., r or k. More 
specifically, we usually denote variables by lower case Latin letters (preferably 
X, y, z, ...), and procedures by lower case Greek letters (preferably tt, tt', ... for 
ordinary procedures, and <j>,(j>',... for formal procedures). Additionally, we 
assume a set of operators O, and a set of terms t G T, which are inductively 
built from variables and operators. 

X, Stmt, CalledProc, ParLst, and Other Functions. The function X 
denotes a polymorphic function, which maps its arguments (e.g., programs, 
flow graphs, statements, etc.) to the set of identifiers having an occurrence 
in it. Superposing it by “d”, the resulting function X^^ maps its arguments 
to the set of identifiers having a defining occurrence in them. Subscribing 
X and X‘^ by a set of identifiers means to restrict the result of X and X‘^ 
for a given argument to the set of identifiers belonging to the specified sub- 
set of identifiers. For example, Xpp(7r) denotes the set of formal procedure 
identifiers having a defining occurrence in procedure tt. 

Additionally, we introduce the polymorphic function Stmt defined on 
programs and procedures of Prog, which maps its arguments to the multiset 
of elementary statements occurring in them. Subscribing Stmt with a cer- 
tain statement type means to project the result of Stmt to those statements 
matching the specified statement type. For example. Stmt cau(j^) denotes the 
set of procedure call statements occurring in procedure tt. Moreover, for ev- 
ery call statement st G Stmtcaii{n), CalledProc (st) denotes the identifier 
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of the procedure called by st. Similarly, the polymorphic function ParLst 
yields the parameter list of its argument, i.e., the formal parameter list of a 
procedure declaration, and the actual parameter list of a procedure call. Addi- 
tionally, [i is a polymorphic projection function defined on lists, wich maps 
an argument list to its i*^ component. The function LhsVar maps every 
assignment statement m = (xi, . . . ,x„) := (fi, . . . ,t„) to the set of variables 
occurring on the left-hand side of m, i.e., LhsVar {m)=df {xi, . . . ,Xn}- 
Furthermore, we introduce for every program 7T = (tti, . . . , tt^) S Prog 
the functions deal, pos, c-occ, p-occ, and occ. Intuitively, deal maps every 
internal identifier of a program to the procedure of its declaration, pos every 
formal procedure identifier to the position of its defining occurrence in the 
relevant formal parameter list, c-occ every procedure identifier to the set 
of call statements invoking it, p-occ every procedure identifier to the set 
of call statements, where it is passed as argument, and occ every procedure 
identifier to the set of call statements in which it occurs. As usual IN denotes 
the set of natural numbers starting with 0, and V the powerset operator. 

1. decl:I(n)^n with declU)=df { , if t G 

undef otherwise 

2. pos :Xfp(^)— ^■IN with pos{i)=df j iff i = ParLst {decl{L))[^ 

3. c-occ : Jp (LI) {Stmt caii{n)) with 

c-occ{t)=df { st I st € Stmtcaii{n) A CalledProc{st) = r } 

4. p-occ : Jp {II) {Stmt caii{n)) with 

p-occ{i)=df { st I st € Stmtcaii{II) A L G ParLst{st) } 

5. occ \ 2p{LI) ^V{Stmtcaii{LI)) with occ(t)=d/ c-occ(r) U p-occ(r). 

For every procedure tt of II, we denote its static predecessor by StatPred{Tr) . 
As usual, StatPred~^ and StatPred* denote the transitive and the reflexive- 
transitive closure of StatPred, respectively.^ Finally, IN* denotes the set of 
lists (i.e., sequences) of natural numbers, which we usually denote by barred 
lower Greek letters, preferably by ui. In particular, the empty sequence is 
denoted by e. 

5.1.3 Conventions 

In order to simplify the notation, we denote the procedures of a program 77 G 
Prog by identifiers of the form Tr^j, where the index ui G IN*, a finite sequence 
of natural numbers, is uniquely determined. The length of u! encodes the 
static level the corresponding procedure is declared on; the longest proper 
prefix of ui gives the index of its unique static predecessor. Thus, given the 
index of an procedure tt the set of all its proper prefixes yields the set of 
all static predecessors of tt. Similarly, the set of indexes of a given length I 
determines the set of procedures of the static level 1. In particular, the main 
procedure is assumed to have static level 1. Thus, it is denoted by tti. 

^ Note that StatPred induces a relation on 77. 
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The following example illustrates our naming convention. 

7T=(7Ti (7Tii,7ri2 (7Ti21,7ri22 ( 7Ti221 ) ), 7Tl3 (tTisi))) 

77 is a program with the following nesting structure of procedures. 

77 = (tti : 

(tth : ... , 

7Ti2 ( . . . : ... : . . . ) : 

(7ri2i(... ... , 

7Ti22 ( ■ • ■ : • ■ • : . . . ) : 

(7T1221 

7ri3 ( . . . : ... : . . . ) : 

(^i3i(... : ... : ...): ...)...) 

...) 

Note that according to our naming convention, the nesting structure of the 
procedures of a program is uniquely encoded in the procedure names. There- 
fore, we will usually omit the angle brackets. For example, for the program 
77 we will simply write: 

77 = ( TTi, TTii, 7 Ti2, 7Ti21, 7Ti22, 7Ti221, 7Ti3, 7Ti31 ) 

In the following we restrict our attention to Pro^programs, which are distin- 
guished and where all procedure calls are correct. 

5.2 Sublanguages 

An important classification of Algol-like programming languages is induced 
by the notion of mode depth (cf. [Ar3, La5, LLW]). Following [Ar3], the 
mode depth MT> of an ordinary or a formal procedure identifier / without 
any procedural arguments or parameters occurring in a call statement or 
declaration, respectively, is defined to be 1. The mode depth of a procedure 
identifier g occurring in a context of the form g( : : hi, . . . , hk) is then 
inductively defined by 

MV{g)=df Max{{MV{h^) 1 7 G {1, . . . , 7} }) + 1 

This notion can easily be enhanced to capture programs. The mode depth 
of a Prog-program 77 is given by the maximal mode depths of its procedure 
identifiers: 

MV{n)=df Max{{ MV{n) \ tt G Xp(77)}) 

The notion of mode depth divides the family of Algol-like programming lan- 
guages into two major groups: languages allowing programs with infinite 
modes like Algol68 [vWMPK, vWMPKSLMF] and languages allowing pro- 
grams with finite modes only like ISO-Pascal [ISO] . The importance of this 
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classification stems from the fact that a variety of problems, which are rel- 
evant for the construction of a compiler, like formal reachability or formal 
recursivity of a procedure are decidable for finite, but undecidable for infi- 
nite mode languages [Arl, Lal].^ Concerning interprocedural program op- 
timization, the problem of formal reachability and its inherent complexity 
is particularly important as we have to consider a refined version of formal 
reachability, called formal callability, in order to avoid worst-case assump- 
tions for formal procedure calls during optimization (cf. Chapter 6). In the 
general case of arbitrary finite mode languages formal reachability is P-space 
hard [Ar3, Wi2]; for languages of mode depth 2, however, formal reachability 
is decidable in polynomial time, if there is a limit on the length of parameter 
lists [Ar3]. Intuitively, mode depth 2 means that formal procedure parame- 
ters do not have procedures as parameters. Note that the original version of 
Wirth’s Pascal [Wth, HW] is a prominent example of an Algol-like program- 
ming language of mode depth 2. In addition to these results on finite mode 
languages, also the following result of [Lai] is important for us: formal reach- 
ability is decidable for subsets of Algol- like languages, which do not allow 
procedures with global formal procedure parameters. This gives rise to the 
definition of the following sublanguages of Prog, which we consider in more 
detail in the following chapter dealing with higher order data flow analysis: 

1. Progf^fj.y the set of Pro^-programs with mode depth m < k, k G IN, 

2. Prog^fpp'. the set of Prog-programs without global formal procedure pa- 

rameters, 

3. Progf^(jy^^fpp\ the set of Pro^-programs with mode depth k without 

global formal procedure parameters. 



5.3 Separate Compilation and Separate Optimization 

Large software systems are typically constructed and organized as a hier- 
archy of subsystems (modules) in order to support a structured and reli- 
able programming. “State-of the-art” languages support this approach of 
“programming-in-the-large” by allowing the composition of programs out of 
modules, which import and export procedures from and to other modules, 
respectively. This has led to the development of compilers offering separate 
compilation, i.e., the ability of compiling single modules as well as complete 
programs. Besides simplifying the software development process, separate 
compilation also supports the reuse of software components (modules) . How- 
ever, in order to exploit the full power of separate compilation, it must be 
enhanced with methods for separate optimization in order to support the gen- 
eration of efficient object code also for separately compiled program modules 
(cf. [Bu, BC2, SW]). Technically, separate optimization can be organized by 

^ The practical impact of these results for the design and the construction of 
compilers is discussed in [Lai]. 
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means of an optimizer library system, which stores, manages, and maintains 
information on previously analysed programs and modules, and intelligently 
supports the retrieval of this information to enhance the optimization of other 
modules (cf. [CKTl, CKT2]). In fact, this approach allows a proper treatment 
of external procedures, which are imported from other modules, and whose 
implementation is invisible to the module currently under optimization. The 
two extremes are here as follows: first, the library fully specifies the semantics 
of an external procedure with respect to the application under consideration; 
second, the library leaves the semantics of an external procedure completely 
unspecified, i.e., concerning the application it provides no (positive) informa- 
tion on its behaviour. In this case the external procedure is treated by means 
of a worst-case assumption. Clearly, the correctness and the optimality of a 
concrete optimization holds relatively to the correctness of the information 
stored in the library system. It is worth noting that besides separate opti- 
mization also incremental optimization is supported by this approach, i.e., 
after changing a program module it is sufficient to reoptimize the modified 
module, and those that hierarchically depend on it. 

Our framework for interprocedural program optimization takes separate 
optimization into account by offering an interface to an optimizer library sys- 
tem. However, it can also be used in the absence of a library system because 
information on external procedures can also manually be fed in. The extreme 
case that nothing is assumed about (some of) the external procedures is taken 
care of by extending the program (module) U under consideration with a 
fictitious procedure ttq, representing the unspecified external procedures 
occurring in the environment of U. In particular, all unspecified external 
variables of II are considered local variables of tto, and all procedure calls of 
n to unspecified external procedures are considered ordinary procedure calls 
of ttq. The procedure ttq is then treated by means of a worst case assumption 
in order to guarantee the safety of the DFA-results. This leads to a uniform 
treatment of internal and external procedure calls in our framework, even if 
their semantics is unspecified. For simplicity, the attribute “unspecified” is 
omitted in the following, and the term external procedure denotes always an 
external procedure without a specified semantics. 
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In this chapter we present the higher order data flow analysis {HO-DFA) of 
our framework. It allows us a proper treatment of formal procedure calls in 
interprocedural program optimization by avoiding worst-case assumptions for 
them. The intent of the HO-DFA is to compute for every formal procedure 
call in a program the set of procedures, which may actually be called by it. In 
IDFA, this information can be used by interpreting formal procedure calls as 
nondeterministic higher-order branch statements. Intuitively, this is closely 
related to approaches for constructing the procedure call graph of a program 
(cf. [CCHK, HK, Wal, Lak, Ry]).^ These approaches, however, are mostly 
heuristically based, and concentrate on correctness (safety) of the analysis 
results. They do not systematically deal with precision. In contrast, investi- 
gating both correctness and precision, and demonstrating the theoretical and 
practical limitations of computing the set of procedures which may be called 
by a formal procedure call is an important point of our approach. Techni- 
cally, we proceed by showing that computing the set of procedures which 
may be called by a formal procedure call is a refinement of the well-known 
formal reachability problem (cf. [Lai]). We therefore call the refined problem 
the formal callability problem. The undecidability of formal reachability in 
general (cf. [Lai]) directly implies that also formal callability is in general 
undecidable. Thus, in order to make our approach practical, we introduce an 
approximation of formal callability, which we call potential passability . We 
prove that it is a correct approximation of formal callability, which can effi- 
ciently be computed. Moreover, for programs of mode depth 2 without global 
formal procedure parameters potential passability and formal callability co- 
incide. 

Conceptually as well as technically, the HO-DFA computing potential 
passability is embedded as a preprocess in our framework for interprocedural 
program optimization. This is important because all details of this preprocess 
can be hidden from the designer of an optimization. The construction of 
an interprocedural optimization proceeds as for programs without formal 
procedure calls. The results of the HO-DFA are automatically taken care of 



^ In [Lak] a different setup is considered with procedure valued variables instead 
of formal procedure parameters. The algorithm of [Ry] is restricted to programs 
without recursion. 
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by interpreting formal procedure calls according to these results as higher- 
order branch statements of ordinary procedure calls during IDFA. Formal 
procedure calls are thereby handled in a most recent fashion. In Chapter 10 
we will thus require that the argument program satisfies the strong formal 
most recent property for guaranteeing the correctness of the IDFA-results (cf. 
[Ka]). It is worth noting, however, that the HO-DFA does not rely on this 
premise. It applies to all programs of Prog. 



6.1 Formal Callability: Formal Reachability Revisited 
and Refined 

Intuitively, callahility is concerned with the question which procedures can 
be called by a procedure call statement at run-time. While this is not at 
all a problem for ordinary procedure calls as the called procedure is unique 
and statically known, it is in general undecidable for formal procedure calls 
(cf. Theorem 6.1.2). The distinctive difference is that the procedure actually 
called by a formal procedure call is given by the argument which is bound 
to the formal procedure, when reaching the call site. This, however, depends 
on the particular program execution reaching the call site as illustrated in 
Example 6.1.1. 

Example 6.1.1. Let 77 G Pf'ogfm( 2 ) be the program given below, where we 
assume that the three call statements in the body of ttis can sequentially 
be executed.^ Note that all formal parameters occurring in 77 are procedure 
parameters, and that the formal procedure parameter (j )2 is a global formal 
procedure parameter of the procedures 71133 and 7 Ti 34 . 

(tti : 

(tth : ... , 

7Ti2 : . . . , 

7’"13 (::</'!, <(' 2 ) : 

(7Ti3i : . . . , 

ti‘132 : • . • , 

7’"133 : ■ ■ ■] <t>2] ■ ■ ■ , 

7Ti34 : . . . ; (j>2; . . .) 

■ • ■ ; 7Ti3( : : 7Ti33, 7Ti3i); . . . ; 7Ti3( : : 7Ti34, 7Ti32); . . . ; ^1; . . . ) 

71‘13( : : TTll, 7 Ti2) ) 



Considering the program 77 of Example 6.1.1 it is easy to check that the 
formal procedure call of (j>i in 7 Ti 3 calls tth, 77133 and 71134 , and that the 
formal procedure calls of 4>2 in 71433 and in 71434 call the procedures 77434 and 
7>'132, respectively. In general, however, it is not decidable which procedures 

^ This can be achieved by adding appropriate branch instructions. 
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are actually called by a formal call statement at run-time. This holds even 
for programs without any input operations as they are expressive enough to 
simulate any Turing machine. Thus, actual callability must be approximated 
by a corresponding static property of a program, which we call formal calla- 
bility. It is defined in terms of the formal execution tree of a program 

n. In essence, results from successive unfolding the argument program 

n by applying the static scope copy rule to the procedure calls currently 
reachable (cf. [Lai, 012]). In particular, Tff°'*{k), fc € IN, denotes the sub- 
tree of in which the distance between the root and its leaves is less or 

equal than k. This is illustrated in Example 6.1.2, which shows the program 
resulting from an application of the copy rule to the program of Example 
6 . 1 . 1 : 

Example 6.1.2. 



(tth : ... , 



7Ti2 : . . . , 





(P2) ■■ 




74131 : • 


• 5 




74132 : ■ 


• 5 




77133 : ■ 


■ ; 4>2', 


• • 5 


74134 : ■ 


■ ; 4>2', 


■•) 


• 74 i 3 ( 


■ '■ 74133 


7413 i ) 



{IGB}) 



• ■ • ; • • ^ 134 , 7 >' 132 ); ■ ■ ■ ', 4 > 1 '^ ■ ■ ■) 



where IGB is an abbreviation of 

( ■ j 

’’■^32 : ■ • ■ j 

’’’j.33 ■ ■ ■ • i '^12! ■ • ■ ) 

7Ti34 : . . . ; 7Ti2; . . . ) 

■ • ■ ; 7Ti3( : : 7Ti33, 7Ti3i); . . . ; 7Ti3( : : 7Ti34, 74132); . . . ; tth; . . . ) 



The acronym IGB used in Example 6.1.2 stands for innermost generated 
block. For clarity the IGB is usually enclosed in curly brackets as shown in 
the example above (cf. [Lai]). The notion of an innermost generated block will 
be important in the following definitions. Intuitively, the innermost generated 
block is the modified copy of the procedure body of the procedure invoked 
by the call statement the copy rule has most recently been applied to. We 
remark that in programs of the copy rule is always applied to procedure 

calls occurring in the currently innermost generated block. 
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Definition 6.1.1 (Formal Callability). 

Let n G Prog, -k G II , and t G Xp(7T). 

1. A procedure tt is k-callable hy l in k iff there is a call statement st' 
with CalledProc{st') = it in the main part of the innermost generated 
block of a program II' G Tff°'*{k), which is the copy of a call statement 
st G Stmtcaii{Tt) with CalledProc{st) = i. 

Let denote the set of all procedures which are k-callable by l 

in k. 

2. A procedure tt is formally callable by i, in k iff tt G \ k G 

IN}. 

Let TC%{f) denote the set of all procedures which are formally callable 
by L in k. 

3. A procedure tt is formally callable by i. iff tt g[J{ TC%{i) | if G iT }. 

Let IFC{i) denote the set of all procedures which are formally callable by 

i. 

Considering Example 6.1.1, the procedures tth, 7Ti33 and 7Ti34 are formally 
callable by 4>i in 7Ti3, and the procedures 7Ti3i and 7Ti32 are formally callable 
by 4>2 in and 71434 , respectively. As we are going to show next, formal 
callability is a direct refinement of formal reachability (cf. [Lai]), which, in- 
tuitively, is concerned with the question, which procedures of a program may 
be called at run-time. 

Definition 6.1.2 (Formal Reachability). 

Let II G Prog, and tt G II . The procedure tt is formally reachable {in signs: 
FormReach{Tr) ) iff there is a program W G whose innermost generated 

block is a copy of tt. 

Let TIZ{n)=df { 7 T G iT I FormReach{Tr) } denote the set of all procedures of 
II which are formally reachable. 

Intuitively, formal reachability of a procedure indicates whether it can (actu- 
ally) be called at run-time. Formal reachability is necessary for actual calla- 
bility, i.e., a procedure, which is not formally reachable, is guaranteed of being 
never called at run-time. The converse implication is in general invalid. In 
addition to formal reachability, formal callability even pinpoints those call 
statements in a program which are responsible for the reachability. Formal 
callability is thus a natural refinement of formal reachability: a procedure tt 
is formally reachable if and only if it is formally callable by some l G Xp(iT). 

Theorem 6.1.1 (Refinement Theorem). 

V77 G Prog Vtt G 17. FormReach{Tr) tt G 1J{ iFC(r) 1 1 G Tp{LI) } 
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As a consequence of the Refinement Theorem 6.1.1 we directly obtain that 
every decision procedure for formal callability is a decision procedure for 
formal reachability as well. As it is known that formal reachability is not 
decidable in general (cf. [Lai]), we have the following negative result: 

Theorem 6.1.2 (Undecidability of Formal Callability). 

It is undecidahle, whether a procedure n of a Prog-program U is formally 
callable by a procedure identifier i. 

Note that Theorem 6.1.2 does not exclude the decidability of formal callabil- 
ity for sublanguages of Prog. However, the Refinement Theorem 6.1.1 yields 
that a decision procedure for formal callability is computationally at least 
as complex as a decision procedure for formal reachability. This is important 
because the results concerning the computational complexity of formal reach- 
ability for sublanguages of Prog carry over to the computational complexity 
of formal callability. At first sight these results are discouraging. In the gen- 
eral case of unbounded finite mode languages (e.g., ISO-Pascal [ISO]), the 
formal reachability problem is P-space hard [Ar3, Wi2]. More encouraging, 
for programming languages of mode depth 2 (e.g., Wirth’s Pascal [Wth, HW]) 
and a limit on the length of parameter lists formal reachability is decidable in 
polynomial time [Ar3]. However, formal reachability is too coarse in order to 
avoid worst-case assumptions for formal procedure calls because it only yields 
the existence of a procedure call statement calling a particular procedure, 
but it does not explicitly pinpoint the set of statements that are responsible 
for its reachability as it is done by formal callability (cf. Refinement Theo- 
rem 6.1.1). Even more, as demonstrated in Example 6.1.1, formal callability 
distinguishes even occurrences of formal procedure calls of the same formal 
procedure identifier which are located in different procedures. In fact, the set 
of procedures which is callable by a formal procedure identifier i depends on 
the procedure containing the formal call statement of t. 

The theoretical limitations of deciding formal callability, and the efficiency 
requirements imposed by program optimization give rise to consider approx- 
imations of formal callability which satisfy the following requirements: 

1. formal callability is correctly (safely) approximated, 

2. for every program 77 G Prog, the approximation is efficiently com- 
putable, and 

3. for certain sublanguages of Prog, the approximation is precise for formal 
callability. 

In Section 6.2 and Section 6.3 we stepwise develop an approximation satis- 
fying these requirements. Central are the notions of formal passability and 
potential passability . The results on potential passability presented in Section 
6.4 show that potential passability meets the requirements introduced above. 
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6.2 Formal Passability 

Intuitively, formal passability deals with the question, which procedures can 
be passed as an argument to a procedure parameter. Formal passability of 
a procedure tt to a procedure parameter t is thus a necessary condition for 
the formal callability of tt by t (cf. Theorem 6.2.1). As we will show, for 
programs of Prog^fpp it is even sufficient. This means, for programs without 
global formal procedure parameters formal callability and formal passability 
coincide (cf. Theorem 6.2.3). 

Definition 6.2.1 (Formal Passability). 

Let n G Prog, and l G 'I-p'p{LI)- 

1. A procedure tt is k-passable to i. iff there is a call statement st' with 

CaUedProc(st') = decl{i) in the main part of the innermost generated 
block of a program II' G such that ParLst{st')lpg,.f^,^^ =tt. 

Let TVk{P) denote the set of all procedures which are k-passable to l. 

2. A procedure tt is formally passable to i iff &[J{ IFPk{f) \ k G IN}. 
Let T'P{i) denote the set of all procedures which are formally passable 
to i. 

It can easily be proved that formal callability implies formal passability. 
Theorem 6.2.1 (Passability). 

ViT G ProgWiG Xfp(II)- tPCii) C TV{i) 

In general, the inclusion in the Passability Theorem 6.2.1 is proper as il- 
lustrated by the program of Example 6.1.1. For this example it can eas- 
ily be checked that iFC.n.j 33 (()) 2 ) = {ttisi}, and iFC.„.j 34 (()) 2 ) = {7Ti 32}, whereas 
iPP{ 4 > 2 ) = 7Ti 32}, and that both procedures 71133 and 7 Ti 34 are 

formally reachable. As a consequence of this example we thus directly ob- 
tain: 

Theorem 6.2.2. 

1. 3 n G Pf'ogfm(2) 3 1 G Iyf{LI) ( 3 tt g J-TZ{II). 

c-occTr{i) yf 0)- C TV{l) 

2. 311 G Progj^(^2) 3tGXFp(f7) 

(37r,7r' G TTZ{n). c-occ,r('-) yf 0 A c-occ,r'('-) y^ 0)- X'C,r(i) y^ X'C,r'(i) 

Theorem 6.2.2 shows that even for programs of mode depth 2 formal calla- 
bility and formal passability are in general not equivalent. The point here 
is that Prog^( 2 )“P'^og'^ams may have global formal procedure parameters. 
In fact, for programs without global formal procedure parameters, i.e., for 
Pro^^^p-programs, we have: 
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Theorem 6.2.3 (Prog^j^p). 

Vil G Prog^fpp (Vt G JFp(il). c-occ(t) y^0). TC{l) = TC deci{,){.i^) = 

Theorem 6.2.3 does not depend on the mode depth of the program un- 
der consideration. However, computing formal passability for 
programs with k > 3 causes similar problems as for programs of mode depth 
2 and global formal procedure parameters. This is illustrated by the two 
program fragments of Example 6.2.1. 

Example 6.2.1. 

Let ill G Progj^( 2 ) ^2 G Pf'ogj^i^ 2 ,),wgfpp be the programs shown below. 
Note that all parameters occurring in 7Ti and II 2 are procedure parameters. 

111 = (tti : 

(tth : ... , 

7Ti2 : . . . , 

TTis (::</'!, 4>2) ■ 

(ttisi (: : (^ 3 ) : . . . ; ^ 3 ; . . . , 
ti‘132 : • . • , 
ti‘133 : ■ . ■ , 

7Ti34 : . . .; (j>2; . . .; 7Ti3i( : : (j)2); . . . , 

7’"135 : ■ • ■ ; 02 ; • ■ • ) 

• • ■ ; 7Ti3( : : 7Ti34, 71132 ); . . . ; 7Ti3( : : 7Ti35, 77133 ); . . . ; 0 i; . . . ) 
7T13( : : TTll, 7 Ti2) ) 

11 2 = (tti : 

(tth (: : 0 i(: : proc), ^ 2 ) : 

(ttih ( : : 03) : • ■ • , 

77112 : . . . ) 

■ • ■ ; 01 ( : : 02); ■ • • ; 77n( : : 77ni, 77112 ); ■ • • , 

7712 ( : : 04) : • . . ; 04; • ■ • , 

7713 : ■ • ■ ) 

77ll( : : 7712, 7713 ) ) 

Considering first program ili, we obtain that the procedures 77131 , 77134 , 
and 77135 of III are formally reachable. Moreover, we can easily check the 
inclusion 

•^’^(03) = {7ri32} C { 7712 , 77132, 77133 } =iF7^(02) 

Hence, EV{4>3) is a proper subset of TV{<j) 2 ). Considering now the program 
772 , we similarly get the inclusion 

T'P{4>a) = { 7713 } C {t7i3, 77112 } = 1PV{4'2) 



which is proper as well. 
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Note that in the program 77i of Example 6.2.1, the set TV{(j)z) depends 
directly on the actual values of 4>2 only, but indirectly also on the actual 
values of 4>i . Intuitively, this is because the procedure 71134 is only reachable 
for particular pairs of elements of TV{<f)i) and TV{4>2)- Therefore, detecting 
the equality TV{4>^) = {7Ti 32} requires a bookkeeping over all combinations 
of arguments occurring in a call of procedure 7 Ti 3 in Clearly, this is 

much more complex than a bookkeeping which for every procedure parameter 
separately keeps track of the set of arguments. In fact, in the first case we 
have up to 

|Tfp(7t) I 

different parameter combinations for a procedure tt, whereas in the second 
case this number can be estimated by 

I TFp(7r) I * I iT I 

Similar effects occur in programs with finite mode depth greater than 2, even 
if they are free of global formal procedure parameters. This is illustrated by 
the program II 2 of Example 6.2.1, which originally was given in [Ar3] in 
order to illustrate the intractability of the formal reachability problem for 
programs with finite mode depths greater than 2. Similarly to the parameter 
4>2 in program iTi, the actual values of ^4 in II 2 depend directly only on 
the actual values of (f> 2 , but indirectly also on the values of (j)i. Detecting 
this requires a bookkeeping over all combinations of arguments occurring 
in procedure calls. Thus, deciding formal passability for programs of mode 
depth greater than 2 even without global formal procedure identifiers is in 
general expensive. 

In contrast, for programs of mode depth 2 without global formal procedure 
parameters effects as illustrated above do not occur. In essence, this is a 
consequence of Property (6.1) of Intuitively, it means 

that in programs with mode depth 2 formal procedure calls do not have 
procedures as arguments. 

V st G Stmtcau{n).CalledProc{st) G Tppin) (6.1) 

=^ParLst{st) n Xp(iT) = 0 

This property is the key for constructing an algorithm, which computes formal 
callability for Pro 5 ^( 2 ) „gjpp-programs efficiently. It relies on the notion of 
potential passability . 



6.3 Potential Passability 

Like formal passability, potential passability deals with the question, which 
procedures are passable as an argument to a procedure parameter. Parame- 
terized with a correct (safe) approximation of the set of formally reachable 
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procedures of the argument program, it yields a correct (safe) approximation 
of formal passability. However, in contrast to formal passability, potential 
passability can efficiently be computed for all Pro^-programs. Moreover, for 
Pro 5 j^( 2 ),«H/pp“P''Ograms potential passability (parameterized with the set of 
formally reachable procedures of the argument program), formal passability 
and formal callability coincide. 

The computation of potential passability proceeds in two steps. First, 
computing a correct (safe) approximation of formal reachability, and second, 
computing potential passability with respect to the reachability information 
of the first step. 

Note that formal reachability is correctly (safely) approximated by the 
elements of the set 



TnApp{n)=df { A I .P7^(7T) C A } 

Obviously, we have TTZApp{n) ^ 0 because of II G fFTZApp{n). This im- 
plies that there is always a safe, though trivial, approximation of formal 
reachability, even for programming languages, for which formal reachabil- 
ity is undecidable. Note, however, that the accuracy of potential passability 
with respect to formal passability depends on the accuracy of the reachabil- 
ity information. Thus, it is important to recall that for programs with mode 
depth 2 and a limit on the length of parameter lists formal reachability is 
computable in polynomial time (cf. [Ar3]). 

The second step of computing potential passability is characterized by 
Equation System 6.3.1, where A G fFTZApp{n). 

Equation System 6.3.1 (Potential Passability). 

f { i } if i G lop(n) 

PP(0 = S [J{PPiParLst{st)lp^^(^^^)\ 

[ st G c-occstmt(A){ { ^ I decl{L) G Pp(k) } ) } otherwise 



Denoting the least solution of Equation System 6.3.3 with respect to A by 
Wa potential passability is defined as follows: 

Definition 6.3.2 (Potential Passability). 

Let n G Prog, A G tFHApp{n), and l G Lfp{II). A procedure it is poten- 
tially passable to l if and only if n G V'Pa(I')- 

Obviously, we have: 



VttG Jop(i7). VVA{7r) = {7T} 

Theorem 6.3.1 yields that potential passability is a correct (safe) approxi- 
mation of formal passability, and thus, by means of the Passability Theorem 
6.2.1 also a correct (safe) approximation of formal callability. 
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Theorem 6.3.1 (Correctness (Safety)). 

ViT G Prog VA G TTlApp{n) Vt G XFp(fl)- ^V{i) C VVA{t) 



Proof. Let i G Xp(7T) and tt G TV{i). According to Definition 6.2.1 
there is a A: G IN, a program U' G and a call statement st' with 

CalledProc(st') = decl{i) in the main part of the innermost generated block 
of n' with PorLsA(st')|po 5 (^) =7r. Under this assumption we have to show: 

TT&VVA{i) (6.2) 



This is proved by induction on the number k of applications of the copy rule, 
which are necessary to obtain program 77'. 

For 7 = 0, we have 77' = 77. Moreover, there is a call statement st in 77 
with CalledProc{st) = decZ(i) G Xop(A7) and Par7st(st)J,po,,(^) = tt. Obvi- 
ously, we have st G Stmtcaiii'^i), and therefore, st G c-occstmt{A){decl{i)). 
Moreover, we have tt G Xop(77). The induction base follows now from the 
following sequence of inclusions: 



VVA{i) 

( decl{L) G Xop(77) ) 

( st G c-occstmt(A){ded{L)) ) 
(Par7st(st)ip„^(,) = 7r) 
(7rGXop(T7)) 



= U { A{ParLst{st)ij,^^(^,)) \ it G 

c-occstm,t(A) ( { K I decl{t) G VVa{k) } ) } 

2 U { PPA{ParLst{st)lp„,^,~j) \ 

st G c-occstmt(A){decl{i) ) } 

X VV A{ParLst{st)ip^^(^)) 

= VVa{tt) 

= M 



The induction step for 7 > 0 can now be proved under the induction hy- 
pothesis 

Vi G Xfp( 77 ) VO < ; < 7. 7 t G X-P,(i) ^ 7 t G rVA { i ) (IH) 

It is sufficient to show that for every i G Xp(77) and every tt G TVk{ii)\ 
TVk-i{ij holds: 

TT G VVa{i) 

Without loss of generality, we can assume that there is a tt G TVk{i)\ 
TVk-i{i) for some i G Xp(77), since otherwise the induction hypothesis 
would suffice. Then there is a call statement st' in 77' with CalledProc(st') = 
ded{i) GXop( 77 ) and Por7st(s7')J,po3(^) = tt G Xqp( 77 ), which is the copy of 
a call statement st of 77 with CalledProc(st) = k, and ParLst{st)lpgg(^^^ = 
(j). Moreover, we have st G c-occa{k). Thus, we are left with investigating the 
following four cases. 
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Case 1. K, (j> G Top{n). 

In this case we obtain k = decl{L) and (j) = 'K. Then the following sequence of 
inclusions 



VVA{i) 
( decl{L) G Wa{k) ) 



{st G C-OCCstmt(A){fi) ) 
(ParLst(st)ipo^(, ) = (/)) 
{n = (j)) 



= U { 'P'P A{ParLst{st)ip^,(^^)) \ 

st G C-OCCstmtiA){{ « I decl{l,) G VVA{k) } ) } 

2 u { PP A{ParLst{st)ip„,(^)) \ 

st G C-OCCgi^i^A)i.^^ } 

D rVA{ParLst{st)lp^^(^^s^) 

= VVA{<t^) 

= {tt} 



completes the proof of Case 1. 

Case “2. K G Xop( J), <f> G XFp(iT). 

In this case we have k = decl{t). Moreover, there is a call statement st” 
in a predecessor 77" of PP in Tff°‘*{k) with CalledProc{st”) = decl{(j)), 
and ParLst{st")[p^g(^^^=TT. Thus, we have tt G TVk-i{4’)i ^nd therefore 
by means of the induction hypothesis (IH) tt G W A{<t>)- Similarly as in Case 
1 we obtain 



VVa{l) 


— 


U { PP A{ParLst{k)ip^,(,)) \ 

st G c-occsfmt{A){{ K \ decl{i 


( decl{t) G Wa{k) ) 


D 


U { PP A{ParLst{st)lp^,(^^)) \ 

st G c-occgi^i^A){k) } 


{st G C-OCCstmt(A){k) ) 


X 


VVA{ParLst{st)lp^^(^^)) 


{ParLst{st)ip^^^^^) = (j>) 


= 


PPa{4^) 


(ttGXX^ (<(.)) 


D 


{tt} 



which completes the proof of Case 2. 

Case 3. K G Xfp(TT), (f) G Xop(77). 

Here, we have (f> = 7T. Additionally, there is a call statement st” in a 
predecessor 77" of 77' in with CalledProc{st”) = ded{K), and 

ParLst{st”)lpgg(^i^^= decl{i). Thus, we obtain decl{i) G TVk-i{iP), and 
therefore by means of the induction hypothesis (IH) decl{i) G Wa{h)- 
Hence, we have: 



PPa{P) = [M'P'PA{ParLst{k)ip„,(,))\ 

st G C-OCCSfmt{A){{ K \ dcclk) G VV a{k) } ) } 

{decl{L) GVVa{k)) k [M'PVA{ParLst{st)[p„^(^))\ 

st G C-OCCgi^i^A)i.^) } 
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{st e c-occstmt(A){f^)) 3 'P'PA{ParLst{st)ip^^(^^)) 

(ParLsi(si)ipo^(, ) = (/)) = VV a{4>) 

{TT = (j)) = { tt } 

which completes the proof of Case 3. 

Case 4. k, (j> € 2 fp{II). 

First, there is a call statement st” in a predecessor 77" of 77' in 
with CalledProc{st") = decl{K), and ParLst{st")lpggf^i^^ = decl{i). Hence, 
we obtain decl^i) G TVk-iiiP), therefore by means of the induction 
hypothesis (IH) decl{i) G VVa{h)- Second, there is a call statement st 
in a predecessor 77 of 77' in with CalledProc{st) = decl{(j)), and 

PorLst = TT. Thus, we have tt G tFVk-i{4>)^ and therefore again 
by means of the induction hypothesis (IH) tt G VVa{<P)- Summarizing, we 
obtain 



VVA{i) 


— 


U { PP A{ParLst{st)ip^,^^^)) \ 

st G c-occstmt(A){{ K \ decl{, 


( ded{L) G VVa{k) ) 


D 


U { PP A{ParLst{st)lp^,(^^)) \ 

st G C-OCCgtmt{A){^{ } 


{st G C-OCCstmt(A){lP) ) 


D 


VV A{ParLst{st)ip^^(^)) 


{ParLst{st)ip^^(^) = (j>) 


= 


VVA{(t>) 


{n&VVA{<t^)) 


D 


{tt} 



This completes the proof of Case 4, and finishes the proof of Theorem 6.3.1. 

□ 

In general, the converse inclusion of Theorem 6.3.1 does not hold. This can 
be demonstrated for example by the programs 77i and 772 of Example 6.2.1. 
It is easy to check the validity of the inclusions 

^P{4’3) = {'^132} C {T^12,T^132,T^133} = PPm(n)i4’3) 

and 

TV{(t>A) = {71‘i3} C {ttis, 1 ^ 112 } = PP j^n(n){^A) 

for III and 772 , respectively. Intuitively, the inaccuracy of potential passabil- 
ity for programs with global formal procedure parameters or for programs 
with mode depth greater than 2, even if A = TIZ{n), is caused by the fact 
that potential passability implicitly assumes independence of the passability 
of a procedure identifier from other procedure identifiers. As illustrated in 
Example 6.2.1 this is in general not true. However, for programs with mode 
depth 2 without global formal procedure parameters, potential passability 
and formal passability coincide (cf. Theorem 6.4.1). 
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Formal Reference Parameters and Aliasing. The notions of formal 
passability and potential passability can naturally be extended to formal 
reference parameters. This is important because regarding formal reference 
parameters as parameterless formal procedures, Equation System 6.3.1 yields 
directly a characterization of the set of may-aliases of a formal reference pa- 
rameter. Moreover, it is easy to modify the equation system in order to also 
obtain a characterization of the set of must-aliases of a reference parameter. 
The HO-DFA can thus immediately be used for computing alias-information 
of formal reference parameters. In particular, this also holds in the presence 
of both formal reference parameters and formal procedure parameters in a 
program. Conceptually, this approach is significantly different from the tra- 
ditional approaches for computing the aliases of formal reference parameters 
(cf. [Ban, Co, CpK2, LH, We]). 



6.3.1 Computing Potential Passability 

The practical relevance of potential passability is due to the fact that Equa- 
tion System 6.3.1 directly specifies an iterative procedure for computing it. 

Algorithm 6.3.3 (Computing Potential Passability). 

Input: A program II e Prog, and a correct (safe) approximation A G 
J^P-App{n) of formal reachability. 

Output: For every procedure identifier l G Tp{II) occurring in II the least 
solution Wa{^-) of Equation System 6.3.1 (stored in pp[i\). 

Remark: The variable workset controls the iterative process, and the vari- 
able M stores the most recent approximation for the procedure identifier 
currently processed. 

( Initialization of the annotation array pp and the variable workset) 

FORALL iGJp(TT) DO 

IF i e Xop(H) THEN pp[l] := {r} ELSE pp[i] := 0 FI 

OD; 

workset := Xfp(LI); 

( Iterative fixed point computation) 

WHILE workset yf 0 DO 
LET i G workset 

BEGIN 

workset := workset\{i.}; 

M := pp[l] U U { PP[ParLst{st)lp^,(^,^)] \ 

st G c-occstmtiA){{ K \ decl{L) G pp[k] } ) }; 
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IF pp[i] C M 

THEN 

pp[i] := M; 

workset := workset U { k | k G 
2^fp( {PP[il] U G { CalledProc(st) \ st G occsfmt(A)ii^) } } ) } 

FI 

END 

OD. 

In order to simplify the formulation of the central properties of Algorithm 
6.3.3, we abbreviate the values of workset and pp[i] after the /c-th execution 
of the while-loop by workset^ and respectively. Using this notation, 

the first part of the following proposition follows immediately from the mono- 
tonicity of the union operator, while the second part is a consequence of the 
first one, and the finiteness of iT, i.e., U contains only a finite number of 
procedures. 

Proposition 6.3.1. 

1. y n G Prog Vi G Jp(77) W k € IN. pp^[t] C 

2. W n £ Prog 3k £ IN. workset’^ = 0 

Note that Proposition 6.3. 1(2) guarantees the termination of Algorithm 6.3.3. 
Moreover, the following theorem shows that it terminates with the least so- 
lution Wa of Equation System 6.3.1, which defines potential passability. 

Theorem 6.3.2 (Algorithm 6.3.3). 

V7T G ProgyA £ TTZApp{n) Vi G Jp(7T). VV A{i) = [j{pp^V] \ k>0} 

In particular, for all i G Xp(il) we have Wa{i-) = pp[i] after the termina- 
tion of Algorithm 6.3.3. 

Proof. Let fixhe an arbitrary solution of Equation System 6.3.1 with respect 
to A G TIZapp{II), i.e., 

r {^} ifiGXop(il) 

fix{L)=< \J{fix{ParLst{st)lp^,.(^,~f)\ (6.3) 

[ st G c-occstm,t{A) ( { '^ I decl{i) £ fix{K) } ) } otherwise 

and let throughout the proof denote the value of the variable M after the 
k-th execution of the while-loop. Then the central step in proving Theorem 
6.3.2 is to check the following four invariants of Algorithm 6.3.3: 

1. V/c G IN Vi G Xp(i7). pp'^Il] C fix{L) 

2. V/cGiNViG Iop{n). pp^[i] = {i} = fix{L) 
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3. y k G W y L. € JFp{n)\workset’^ . 

pp>^[t] = [j{pp'^[ParLst{st)lp^,(^,)] \ 

st e c-occstmt(A){{ K \ decl{i.) G pp'^[k] } ) } 

4. V/c G IN. workset^ C JFp(iT) 

For k = Q, the investigation of the initialization part of Algorithm 6.3.3 yields: 

а) y L G Xop(0- pp°[i] = {4 

б) V t G Xfp (t) . [t] = 0 

c) workset^ =XFp{n) 

Hence, invariant (3) holds trivially because of (c), and the invariants (1), (2), 
and (4) follow immediately from (a), (6), and (c). 

In order to prove the induction step, let k > 0, and let l be the formal 
procedure identifier chosen from the workset during the /c-th execution of 
the while-loop. Under the induction hypothesis (IH) that the invariants (1) 
to (4) are satisfied for all 0 < I < k, we have to show that the invariants (1) 
to (4) hold after the fc-th execution of the while-loop. We obtain 

= pp'"~^[t] U U { Pp’"~^[ParLst{st)lp^,(^,)] \ 

st G c-occstmt(A){{ « I decl{L) G pp'"~'^[k] } ) } 



Moreover, we have 



( if i = K A M’^\pp'^-^[t] 4 0 

\ PP^~^[k] otherwise 



(6.4) 



workset^ = 

r (w;orfae4“^\{i}) U { K I K G Xfp(pp[I] | if M^\pp^~^[l] 

< I G { CalledProc{st) \ st G occstmt(A){i') } ) } 4 0 (6-5) 

[ workset^~^\{L\ otherwise 



Now, the invariants (2) and (3) follow immediately from equation (6.4) and 
the induction hypothesis (IH). Similarly, invariant (4) is a consequence of 
equation (6.5) and the induction hypothesis (IH). Thus, we are left with 
checking invariant (1). By means of equation (6.4) and the induction hy- 
pothesis (IH) it is sufficient to check invariant (1) for l. If M^\pp^~^[l] = %, 
invariant (1) is simply a consequence of the induction hypothesis (IH). With- 
out loss of generality we can thus assume that there is a tt G M^\pp^~^\l\. 
The induction step then follows from the following sequence of inclusions: 

= pp'"~^[i] U U { Pp'"~^[ParLst{st)ip^,(,)] \ 

st G c-occstmt(A){{n I ded{i) G pp'"~^[n]})} 

( (IH) for (1) ) C fix{i) U U { fix{ParLst{st)ip^^^^^)) \ 

st G c-occstmt(A){{ K I decl{i) G fix{n) } ) } 

= 



((6.3)) 
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Theorem 6.3.2 is now an immediate consequence of the invariants (1), (2), 
and (3), and the emptiness of the workset after the termination of Algorithm 
6.3.3, which is guaranteed by Proposition 6. 3. 1(2). □ 

6.3.2 An Efficient Variant 

In this section we present an alternative algorithm for computing potential 
passability. In comparison to Algorithm 6.3.3, the new algorithm organizes 
the workset more sophisticatedly, which results in an improved estimation 
of its worst-case time complexity. The point is that whenever an element is 
added to the workset the global chain length has decreased. 

Algorithm 6.3.4 (Computing Potential Passability Efficiently). 

Input: A program U G Prog, and a correct (safe) approximation A G 
^T^App{n) of formal reachability. 

Output: For every procedure identifier r G Xp(il) occurring in U the least 
solution W a{i) of Equation System 6.3.1 (stored in pp[i\). 

Remark: The variable workset controls the iterative process, and the vari- 
able M stores the most recent approximation for the procedure identifier 
currently processed. 

( Initialization of the annotation array pp and the variable workset) 

FORALL iGJp(TT) DO 

IF iGJop(fl) THEN pp[L]-.= {i} ELSE pp[r] := 0 FI 

OD; 

workset := Xop(fl); 

( Iterative fixed point computation) 

WHILE workset yf 0 DO 
LET i G workset 

BEGIN 

workset := workset\{i.}; 

FORALL i' G pp[i] U 

U {PPiA I G {CalledProc{st) \ st G p-occgt^^^-^{L) } } DO 
FORALL KGXFp(t') DO 

M ■= pp[n] U U { I 

stG c-occstmt(A){{k, \ decl{n) G pp[k]})}; 

IF pp[k] C M 

THEN 

PP[k] := M; 

workset := workset U { k } 

FI 

OD 

OD 

END 



OD. 
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Algorithm 6.3.4 terminates like Algorithm 6.3.3 with the least solution of 
Equation System 6.3.1. Thus, it also computes potential passability. 

Theorem 6.3.3 (Algorithm 6.3.4). 

V7T e Prog\/A G TTZApp{n) Vr gIp(TT). VV A{i) = [j {pp^V] \ k>0} 

In particular, for all l G Xp(i7) we have W a{>) = Pp[i] after the termina- 
tion of Algorithm 6.3.4- 

Central for proving this result is Proposition 6.3.2, where workset^^’^’"^'^ and 
denote the values of workset and pp[t.] after the fc-th execution 
of the while-loop, the ^-th execution of the outer for-loop, and the m-th 
execution of the inner for-loop, respectively. Mpreover, <iex denotes the 
lexicographic order of elements of IN x IN x IN. 

Proposition 6.3.2. 

1. n G Prog Vi G Jp(77) V k, I, m, k', I', m' G IN. 

{k,l,m) <iex \k' ,V ,m') [t] C I )[i] 

2. W n G Prog 3k G IN. workset^ = 0 

Based on Proposition 6.3.2, the proof of Theorem 6.3.3 proceeds along the 
lines of the proof of Theorem 6.3.2, and is thus omitted. In addition to the 
induction on the number of executions of the while-loop it requires nested 
inductions on the number of executions of the outer and the inner for-loop. 



6.4 Main Results on Potential Passability 

In this section we present the main results concerning correctness (safety), 
precision, and computational complexity of potential passability. 



6.4.1 Correctness and Precision 

The Correctness Theorem 6.3.1 and the Passability Theorem 6.2.1 guarantee 
that potential passability is a correct (safe) approximation of formal passabil- 
ity and formal callability. Moreover, for programs of mode depth 2 without 
global formal procedure parameters, i.e., for Prog^( 2 ) poten- 

tial passability and formal passability coincide: 

Theorem 6.4.1 (Pro 5 j^( 2 ),ug^p"-^^"Pi'®cision). 

V77 G Progf^p2),-wgfpp ^ gIfp{II). VV = IFV{i) 
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Proof. As the inclusion “D” is a consequence of the Safety Theorem 6.3.1, 
we are left with the inclusion “C”. According to Theorem 6.3.2 we prove the 
equivalent formula 

VtGJFp(iT) (6.6) 

by induction on the number k of executions of the while-loop of Algorithm 
6.3.4. Throughout this proof, let denote the value of the variable M 
after the fc-th execution of the while-loop. 

The case fc = 0 is trivial according to the initialization of pp^[i\, i G 
with 0. Thus, let k > 0, and consider the fc-th execution of the 
while-loop under the induction hypothesis (IH) that (6.6) holds before the 
/c-th execution of the while-loop. Let t. be the element currently chosen from 
the workset. Then we have: 

= Pp'"~^[l] U [J{pp’^-^[ParLst{st)lp^,^,)] \ 

stG c-occstmt{j^K{n)){{K\ decl{L) e pp'"~^[k]})} 

Without loss of generality we can assume that there is a tt G M^\pp^~^[l], 
since otherwise the induction hypothesis (IH) would suffice. This implies that 
there is a statement st G c-occstmt(F'R(n)){i^' ) for some k! G Xp(iT) with 
decl{t.) G pp^~^[k'], and CalledProc{st) = k' . 

Moreover, we have ParLst{st)lpgg(^^-^ = n” , and tt G pp’^~^[k”]. Since iT is 
a program with mode depth 2, property (6.1) yields k' G Xop(H). This 
directly delivers K'=decl{i), and therefore CalledProc{st) = decl{i) . Next 
we must investigate two cases. First, if k” G Jop(Lf), we obtain n" = tt, 
and therefore ParLst{st)[pggf^^-^ = TT. Hence, tt G TV{l) is a consequence 
of st G c-occstmt{F'R(n)){decl{i)). Second, if k" G XFp(il), the induc- 
tion hypothesis (IH) yields tt G TV{k") because of tt G pp^~^[k"\. Ac- 
cording to Definition 6.2.1, there is a program TT' G which contains 

in the main part of the innermost generated block a statement st' with 
CalledProc{st') = decl{K"), and ParLst{st')[pggf^^„'^=TT. Let TT" be the suc- 
cessor of TT' in which results from applying the copy rule to the call 

statement st' in TT'. Since TT is a program without global formal procedure 
parameters, the main part of the innermost generated block of TT" contains a 
copy st" of st, in which k" is replaced by tt, i.e., CalledProc{st") = decl{C), 
and ParTst(st")|po,,(^) =7T. Thus, Definition 6.2.1 yields tt G TV{l), which 
completes the proof of Theorem 6.4.1. □ 

Combining Theorem 6.2.3 and Theorem 6.4.1 we directly obtain that poten- 
tial passability and formal callability coincide for programs of mode depth 2 
without global formal procedure parameters: 

Corollary 6.4.1 {Progf^(^ 2 ),v^fpp~^C-'Pvec\s\on). 

VTT G Progfi^(^2),wgfppi'^'' ^ IvviPk) ■c-occj^n{n){i) ^ ■'P'P Fn{n){id) = tFC{i) 
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6.4.2 Complexity 

The costs of our HO-DFA are the added up costs of computing a safe ap- 
proximation of formal reachability information, and the costs of computing 
potential passability accordingly to this information by means of Algorithm 
6.3.4. The complexity of the first step depends above all on the degree of 
precisision desired for the approximation. In order to illustrate the range of 
the complexity of this step, recall that formal reachability is undecidable in 
general, and P-space hard for finite mode languages [Ar3, Lai, Wi2], but 
that it is always safely approximated by the set of procedures occurring in 
a program. The complexity of the second step is given by the complexity of 
Algorithm 6.3.4. Given its input, a program U S Prog, and a safe approxi- 
mation of formal reachability information, the worst-case time complexity of 
this algorithm can essentially be estimated by 

0(|Xop(i7) I -h |XFp(iT) I * |Xp(77) I) 

which reflects the worst-case number of occurrences of procedure identifiers 
on the workset before Algorithm 6.3.4 terminates. Thus, we have: 

Theorem 6.4.2 (ComputationalComplexity of PotentialPassability). 

Given a program II G Prog, and a safe approximation of formal reachability 
information, potential passability is computable in quadratic time. 

It is important that for programs of Progj^(^ 2 ) wgfpp ^^e proposition of The- 
orem 6.4.2 can considerably be strengthened. The point here is that formal 
reachability is decidable in quadratic time for programs with mode depth 2 
and a limit on the length of parameter lists by solving a reachability problem 
in a graph with | II \ nodes, and | Stmtcaii{II) \ edges as shown in [Ar3]. 
Combining this result with Theorem 6.4.1 and Theorem 6.4.2, we obtain: 

Theorem 6.4.3 (Pro 5 jv,,( 2 ) ,^^p-Complexity). 

For all programs II G P™9fm{2), wgfpp > formal callability is decidable in 
quadratic time, if there is a limit on the length of parameter lists. 

For practical applications, the limit required on the length of parameter lists 
can be considered harmless. Thus, the results of this section can be summa- 
rized as follows. Potential passability is 

1. a correct (safe) approximation of formal callability, 

2. is efficiently computable for all programs 77 G Prog, and 

3. is precise for formal callability for programs of Prog wgfpp ■ 

As a consequence, we obtain the following corollary on Wirth’s Pascal (cf. 
[Wth, HW]): 

Corollary 6.4.2 (Wirth’s Pascal). 

Formal callability is decidable in quadratic time for programs of Wirth’s Pas- 
cal without global formal procedure parameters, if there is a limit on the length 
of parameter lists. 




7. The Interprocedural Setting 



In this chapter we complete the setting of our framework for interprocedural 
program optimization. We first introduce flow graph systems and interpro- 
cedural flow graphs as program representations, and subsequently, describe 
the interface connecting HO-DFA and IDFA. Central is then the extension 
of the two-step scheme of optimal intraprocedural program optimization to 
the interprocedural setting. It turns out that this scheme applies naturally to 
the interprocedural setting, too. One should note, however, that in contrast 
to the data flow analyses the transformations based thereon usually require 
additional care. 



7.1 Flow Graph Systems 

As usual in interprocedural program optimization we represent a Prog- 
program II = , • ■ • , ) by means of a flow graph system S = { G^o ) 

. . . , Gok ), dJj G IN*. S' is a system of flow graphs with disjoint sets 
of nodes and edges, where every procedure tt of 77 is represented as a 
directed flow graph G = {N, E,s,e) in the sense of Section 2.1.1. Nodes 
n G Ni represent the statements and edges (n, m) G Ei the nondeter- 
ministic (intraprocedural) branching structure of the underlying procedure 
TTi. In our setting, the nodes represent parallel assignments and procedure 
calls. We assume that the fictitious procedure ttq, which stands for the 
(unspecified) external procedures occurring in the environment of 77 (cf. 
Section 5.3), is represented by the flow graph G^g = (No,Eo,So,eo) with 
node set 7Vo=d/ {so, rio, eg} and edge set 7fo=d/ {(sg, no), (ng, eg)}. External 
variables and external procedure calls occurring in 77 are considered local 
variables and ordinary procedure calls of Gg. Moreover, N'^= df I i G 

{0, . . . , k}} and E"® =df [jlEa, | f G {0, . . . , k}} denote the sets of all nodes 
and edges of S. Additionally, and denote the sets of ordinary and 
formal procedure call nodes, and Nf=d/ Nf} U the set of all procedure 
call nodes of S. 

Figure 7.1 shows an illustrating flow graph system using the convention 
that a procedure with name tt, formal value parameters formal 

reference parameters z\, . . . , Zr, formal procedure parameters 4>\, ... ,4>s, and 
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local variables v\, . . . ,Vu is denoted by 7t(/i , . . . , : Z\, . . . , Zr : (j>i, . ■ . , 4>s) ■ 

Vi, ...,Vu- 



71 j : a, d, e 




):c ^n^fir ^ ‘I'ljlva/ : re/: )): i " 13 (-^13 ' • ) 




Fig. 7.1. Flow graph system 



7.1.1 The Functions fg, callee, caller., start, and end 

For every flow graph system S we deflne the functions fg, callee, caller, 
start, and end. Intuitively, fg maps every node of S to the flow graph it is 
contained in, callee maps every procedure call node to the set of procedures it 
may call, caller maps every procedure to the set of call nodes, from which it 
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may be called, and start and end map every flow graph to its start node and 
end node, respectively. We remark that in the deflnition of callee call nodes 
of S are identified with the corresponding call statements of the program U 
underlying S. Thus, CalledProc{n) , n G Nf, denotes the identifler of the 
called procedure. 

Concerning the deflnition of the function callee, recall that tFTZApp{n) ^ 
0 because we always have II G TlZApp{II)- Moreover, we recall that formal 
reachability is decidable for finite mode languages, and decidable in quadratic 
time for programs of mode depth 2 and a limit on the length of parameter 
lists (cf. [Ar3]). 



1. /(/ : N'^^S' with fg{n)=dfGat iff n G Nd,^ 

2. callee : Nf ^V{S) with 

,, , , f W A{CalledProc(n)) if CalledProcin) yf Go 

eallee{n)=df {Go} otherwise 

where A G TlZApp{II) is assumed to be a correct (safe) approxima- 
tion (i.e., a superset) of the set of formally reachable procedures of 
77. 

3. caller S ^ V with caller{Go^)=df {n\Go^ & callee{n)}. 

4. start : S'^ {S( 5 q, ... with start{Gdj^)=df s,q. 

for all j G (0, . . . , fcj 

5. end : S'^ with end{Ga^)=df 

for all j G {0, . . . , k}. 



7.1.2 The Interface Between HO-DFA and IDFA 

In this section we present the interface between HO-DFA and IDFA. It is 
given by the function callee, which imports the results of the HO-DFA, and 
allows us to feed them into the subsequent IDFA. It is thus the connecting link 
between HO-DFA and IDFA. Technically, it gives us the handle for treating 
formal procedure calls as nondeterministic higher-order branch statements 
during IDFA (cf. Deflnition 8.3.2). In Section 7.2 this will be made explicit 
by constructing the interprocedural flow graph of a flow graph system (cf. 
Algorithm 7.2.1). 



7.2 Interprocedural Flow Graphs 

In contrast to the intraprocedural control flow, a flow graph system S does 
not explicitly represent the interprocedural control flow caused by procedure 
calls. As usual, this is achieved by combining the flow graphs of S to an 
interprocedural flow graph G* = {N*,E*,s*,e*), whose start and end node 
s* and e* are given by the start and end node Si and ei of the main 
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procedure of the underlying program, respectively (cf. [My, SP]). For every 
flow graph system S, the interprocedural flow graph G* corresponding to S 
is then constructed by means of the three-step procedure of Algorithm 7.2.1, 
which is applied to all procedure call nodes of S. Note that this algorithm 
replaces every formal procedure call by a set of ordinary procedure calls, 
which is given by the set of procedures which are potentially passable to the 
formal call under consideration. 



Algorithm 7.2.1 (Constructing the Interprocedural Flow Graph). 

Let S' be a flow graph system, and let {x\, . . . ,Xs} be the set of external 
variables occurring in S. Then, for every procedure call node n G Nf , do: 

1. Replace n by a set of nodes containing for every identifier i € callee{n) 
a pair of new nodes, the call node nc{i) and the return node nn{L), 
where nc{d) has the same set of predecessors as n but no successors, 
and rifl(i) has the same set of successors as n but no predecessors. 

2. Attach nc{i) with the assignment 



{fl, ■ ■ ■ JqGl, ■ ■ ■ 1 Zr,Vi, . . . ,Vu) ■■= (t 1 , . . . , , J/l , . . . , T/r , A, . . . , A) 

if Go 

{foil • ■ • ) foqj ■ ■ ■ ) ZQr) ■= {ti, . . . , IJr) 

otherwise 



and rifl(i) with the assignment 

/ (/l) ■ ■ ■ ) fqi -21) • • • ) Zr, Vi, . . . , Vu) := (A, . . . , A) if L ^ Gq 
\ (a;i, . . . , Xs}) := (A, . . . , A) otherwise 

where ti, . . . ,tq and yi, ■ ■ ■ ,yr are the value and reference parameter 
arguments of n, where fi, . . . , fq, z\, . . . , Zr,vi, . . . ,Vu are the formal pa- 
rameters and local variables of the called procedure t for l yf Gq, where 
/oi, . . . , foq, zoi , . . . , Zor are new identifiers, which are not occurring in 
77, for r = Go, and where “A” denotes the special value “undefined”. 

3. For every identifier i. G callee(n), add an edge from nc(i) to start(decl(c)), 
and from end{decl{i)) to nn{i). 

In the following, we denote the sets of call nodes and return nodes of N* by 
Nf and TV*, respectively. Moreover, we denote the sets of immediate interpro- 
cedural predecessors and successors of n by pred*{n)=df { m \ (m, n) G E* } 
and succ*{n)=df { m | (n, m) € E* }, respectively. 

Figure 7.2 shows the interprocedural flow graph, which corresponds to the 
flow graph system of Figure 7.1. For clarity, edges starting in nodes of TV* 
and TV* are displayed by dashed and dotted lines, respectively. 
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Fig. 7.2. The interprocedural flow graph 

7.2.1 Interprocedural Paths 

The notion of a finite path introduced in Section 2.1.1 applies to interproce- 
dural flow graphs as well. However, in contrast to the intraprocedural setting, 
not every finite path of an interprocedural flow graph represents a valid exe- 
cution. This is because of the special nature of procedure calls: for example, 
in Figure 7.2 the path (1, 3c, 9, 10c, 5, 7, 8, 10^) is possible, while the path 
(1, 3c, 9, 10c, 5, 7, 8, 6fl) is not. This leads to the notion of interprocedural 
(or interprocedurally valid) paths, which was originally introduced by Sharir 
and Pnueli by means of an algorithmic definition (cf. [SP]). Below we present 
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a definition in terms of words of context-free languages, which was also sug- 
gested by Sharir and Pnueli, however, without giving details. 

Definition 7.2.2 (Path Grammar and Path Language). 

Let G* be an interprocedural flow graph. 

1. T/ie interprocedural path grammar Q{G*) of G* is the triple {V,T,P), 
where 

a) V=df {Vn\n G N* } denotes the set o/ non-terminal symbols, 

b) T=dfN* the set o/ terminal symbols, and 

c) P=df {Vn^nVm\n ^ {eo, . . . ,Bk} A m G succ* {n)\N* } U 

{Vn^n Vmc(i) y-mR(L) \ mc(i) G succ*{n)C\N* } U 
{ 14 — > e I e G {eo, . . . , e^} } the set of context-free production 

rules . 

2. The path language induced by a node n G N*\N* is defined by: 
Cn{G{G*))=df {p GT* \ 3q G {V UT)*. ¥„ q A p is a prefix of q} 

Denoting the set of all finite paths from to to n in an interprocedural flow 
graph G* by P[TO,n], we can now define: 

Definition 7.2.3 (Interprocedural Paths). 

Let G* be an interprocedural flow graph. 

1. A path p G P[to, n] is an interprocedural path of G* if and only if 
pGCm{Q{G*)). 

2. IP[TO,n] denotes the set of all interprocedural paths from to to n, 
IP [TO,n[ the set of all interprocedural paths from m to a predecessor 
of n, and IP]TO,n] the set of all interprocedural paths from a successor 
of TO to n. 

3. An interprocedural path reaching the end node of G* is called terminat- 
ing. 

Additionally, we introduce the notion of matching call and return nodes, 
which simplifies argumentations on interprocedural paths. 

Definition 7.2.4 (Matching Call and Return Nodes). 

Let G* be an interprocedural flow graph, and let p be an interprocedural path 
of G* . Two occurrences of a call node and a return node on p are called to 
match each other, if the corresponding occurrences of their non-terminals in 
a derivation of a word q G Cm{G{G*)), of which p is a prefix, result from 
the same application of a production rule of G{G*). 
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7.2.2 Complete Interprocedural Paths 

In addition to interprocedural paths, we need the notion of complete in- 
terprocedural paths, which are important for determining the semantics of 
procedure calls. Complete interprocedural paths are interprocedural paths p 
from start (fg{n)) to n, which are characterized by the fact that all procedure 
calls occurring on p are completed by a subsequent return. This guarantees 
that the occurrences of start {fg(ji)) and n belong to the same procedure 
incarnation. 

Definition 7.2.5 (Complete Interprocedural Paths). 

1. An interprocedural path p G IP[start(fg(n)), n] is called complete if it 
possesses equally many occurrences of procedure call and return nodes: 

K} I = I {z I K e Nf} I 

2. CIP [start {fg{n)),n] and CIP [start {fg{n)),n[ denote the set of all com- 
plete mterpTOcedvrcdl paths from start{fg{n)) to n, and from start{fg{n)) 
to a predecessor of n, respectively. 

The following property of interprocedural paths expresses that the definition 
above actually realizes the desired intention. Essentially, this is a consequence 
of the form of the context-free production rules of Q{G*). 

Lemma 7.2.1. Let p G IP[m, rz] he an interprocedural path, and let (pi,pj) 
and {pi',Pj') be two pairs of matching call and return nodes of p. Then the 
integer intervals [i : j] and [i' : j'] are either disjoint or one is included in 
the other. 

Figure 7.3 illustrates the pattern stated in Lemma 7.2.1, where {{nci , nR,) [i G 
{1, . . . , 4} } are assumed to be pairs of matching call nodes and return nodes 
of p G IP [to, n]. 



P = fai, 




Fig. 7.3. Complete interprocedural paths 



..., nj 



The following lemma characterizes another property of matching call and 
return nodes, which can easily be proved, too. 
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Lemma 7.2.2. Let s G {sq, . . . ,Sfc}, let p G IP[s,n] and let (pi,Pj) be a 
pair of matching call and return nodes. Then we have: 

p]i,j[& Cl'P[start{callee{pi)), end{callee{pi))] 

Remark 7.2.1. If the underlying program 77 consists of a single procedure tt 
only, the flow graph system S and the interprocedural flow graph G* collapse 
to the flow graph G of tt. In this case the interprocedural framework coincides 
with the standard framework for intraprocedural optimization of Chapter 2. 

Conventions. In the following we restrict our attention to programs of Prog, 
which satisfy the sfmr-property. We recall that this property is decidable for 
Prog [Ka]. In particular, it holds for all programs without static procedure 
nesting or without formal procedure calls. Additionally, we assume that pro- 
grams containing formal procedure calls have been analyzed by means of the 
HO-DFA of Chapter 6. Thus, we have potential passability information for 
it. 

For every program 77 satisfying this premise, we denote the flow graph 
system and the interprocedural flow graph representing 77 by S' and G* . 
Without loss of generality we assume that every node n G TV* lies on an 
interprocedural path from s* to e*. We denote the call nodes and return 
nodes of G* corresponding to a procedure call node n G Nf of S by nc{h) 
and nn{L), respectively, where i G callee{n). For ordinary procedure call 
nodes n G of S, we usually omit the explicit addition of i because the 
procedure called by n is uniquely determined. Conversely, for every call node 
or return node n G TV* U TV* of G* we denote the corresponding procedure 
call node of S by ng. Additionally, we identify the set N'^ of nodes of S 
with the set TV* \TV* of nodes of G* in order to obtain an interpretation 
independent notion of program point. In particular, we identify every node 
n G with the set of nodes {nc(t) | i G callee{n) }. This allows us not 
only concise notations, e.g., 

Vn G Nf Vp G U { IP[s*, nc(t)[ | i G callee{n) } 
can be abbreviated by 

VnGNf VpGlP[s*,n[, 

it is also the key for handling backward analyses in our framework by forward 
analyses simply by inverting the flow of control (cf. Section 8.7). 



7.3 Interprocedural Program Optimization 

In this section we can now extend the two-step approach of optimal in- 
traprocedural program optimization to the interprocedural setting. As in the 
intraprocedural case, we are interested in program transformations, which 
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are provably optimal with respect to formal optimality criteria. The central 
contribution of this section is to demonstrate that optimal interprocedural 
program optimization can be organized by the same two-step approach as its 
intraprocedural counterpart. In the first step, we fix a class of interprocedural 
program transformations T, and a formal optimality criterion O. In the sec- 
ond step, we fix a transformation Tropt G T, and prove that it is O-optimal. 
Like intraprocedural optimizations, also interprocedural ones are usually de- 
fined in terms of appropriate program properties ip, whose validity must be 
verified in order to perform the transformation Tropt under consideration. 
This is the task of interprocedural DFA (IDFA). As in the intraprocedural 
setting, the program properties involved must be computed precisely (or at 
least conservatively, i.e., safely, approximated) by the IDFA in order to guar- 
antee that the transformation induced by its results is O-optimal. This leads 
us to the notion of p-precise (p-correcf) IDFA-algorithms. 

In Section 7.3.1 we present the two-step scheme of optimal interprocedural 
program optimization, and subsequently, we introduce in Section 7.3.2 the 
notion of precise (correct) IDFA-algorithms. 

7.3.1 Provably Optimal Interprocedural Program Transformations 

Let S' be a flow graph system, let T be a class of interprocedural program 
transformations, let Tr G T be a transformation of T, and let Stv denote 
the flow graph system resulting from the application of Tr to S. Additionally, 
let Sr=df { S } U { Stt \Tr G T } denote the set of all programs resulting 
from a transformation of T extended by S itself. Following the lines of 
Section 2.1.2, optimal interprocedural program optimization proceeds along 
the following two-step procedure: 



Step 1: Fix a class of interprocedural program transformations T and 
a relation <r C Sr St- 



Intuitively, the relation <r compares the “quality” of transformations 
Tr, Tr' G T. It is usually given by a quasi-order or a partial order, and 
Srr <r Sxr' can informally be read as “Tr is better than Tr'^\ This 
induces the interprocedural version of 0<^-optimality: 

Definition 7.3.1 (0<^-0ptimality). 

A transformation Tr G T is -optimal, if for all Tr' G T holds: 
Srr <r Srr' 



Step 2: Fix a transformation Tropt G T and prove that it is 0<.p- 
optimal. 
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Note that the two-step scheme above evolves from its intraprocedural counter- 
part just by replacing the flow graph G, which represents a single procedure 
of the argument program 77, by the flow graph system S, which represents 
all its procedures. This analogy can be continued. In particular, the inter- 
procedural transformations Tr G T are also typically defined in terms of a 
set of program properties 'P, where every ip G <P is a pair (N-ip , X-ip ) of 
functions from the set of nodes of S to the set of Boolean truth values B, 
i.e., 

N-if ,X-ip 

Though in the interprocedural setting the definition of a property ip is usu- 
ally more complex than in the intraprocedural setting because the effects of 
procedure calls must be taken into account, the underlying intuition is the 
same: N-ip (n) and X-ip (n), n G N’^, indicate whether ip holds at the entry 
and at the exit of the argument node n, respectively. 

Finally, it is worth noting that proving the -optimality of an inter- 
procedural transformation Tropt G T does not rely on the algorithms used 
for computing the program properties involved in Tropt- In fact, this can 
separately be proved, which structures and simplifies the overall proof like in 
the intraprocedural case by decomposing it into two independent steps. 

7.3.2 Provably Precise Interprocedural Data Flow Analyses 

In the context of our two-step approach for optimal interprocedural pro- 
gram optimization the task of ID FA is to compute the program properties 
ip, which are involved in the program transformation Tropt- As in the in- 
traprocedural case, this requires static analyses of S, which are performed by 
IDFA-algorithms computing the sets of program points enjoying the program 
properties ip. This leads directly to the notions of correctness and precision 
of an IDFA-algorithm. Intuitively, an IDFA-algorithm is ip-correct, if it com- 
putes a subset of nodes of S enjoying ip, and it is ip-precise, if it computes 
this set precisely.^ Once the IDFA-algorithms have been proved precise for 
the program properties involved in Tropt, it is usually easy to perform the 
transformation itself, and the program resulting from this transformation 
is guaranteed to be (!I<^-optimal.^ In the following chapter we will show, 
how to construct correct and precise IDFA-algorithms. Central will be the 
stack-based framework for interprocedural abstract interpretation of [KSl]. 



^ yi-correctness and y)-precision are formally defined in Section 8.6. 
^ ^-correctness is usually not sufficient to draw this conclusion. 
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In this chapter we extend the theory of abstract interpretation interprocedu- 
rally. The point of this extension, which proceeds essentially along the lines 
of [KSl], is to mimic the operational behaviour of run-time systems of Algol- 
like programming languages. Central is the introduction of data flow analysis 
stacks {DFA-stacks) and return functions. DFA-stacks can be considered the 
compile-time equivalent of run-time stacks of run-time systems. While the 
DFA-information of interest is encoded as intraprocedurally by the elements 
of a lattice, the local semantic functions defining the abstract semantics of 
elementary statements are enhanced. Interprocedurally, they work on DFA- 
stacks composed of lattice elements instead of lattice elements only. Return 
functions, finally, mimic the effect of a return from a (recursive) procedure 
call on the run-time stack. In comparison to the presentation of [KSI], the de- 
velopment is enhanced in order to handle formal procedure calls and external 
procedures. 



8.1 Data Flow Analysis Stacks 

Considering a program with recursive procedures and local variables it is im- 
portant to note that there are potentially infinitely many copies of the local 
variables at run-time. In fact, every procedure call occurring in a run-time 
execution causes the creation of a new copy of the local variables of the pro- 
cedure called, which are removed after finishing the call. Finishing a recursive 
call it is important that the local variables of the enclosing, but not yet fin- 
ished procedure call of the same procedure become valid (“visible”) again. 
Technically, a run-time system achieves this by maintaining a run-time stack 
recording the part of the history which will become relevant after returning 
from nested procedure calls. These effects must properly be modeled by an 
interprocedural abstract interpretation. In essence, this requires to work on 
stacks composed of lattice elements instead of the lattice elements only, which 
are assumed to represent the DFA-information of interest. This gives rise to 
introduce the set of DFA-stacks STACK as the compile-time equivalent of 
run-time stacks. STACK is the set of all non-empty stacks with entries of 
C. Like run-time stacks, DFA-stacks record the part of the history which will 
become relevant after returning from nested procedure calls. They directly 
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reflect the nesting of procedure incarnations according to the current call 
sequence. We remark that DFA-stacks can be manipulated by means of the 
typical stack-operationsd 



1. newstack : C — 


y STACK 


2. push : STACK xC^ STACK 


3. pop: STACK - 


STACK 


4. top : STACK- 





Intuitively, newstack{c) creates a new stack with single entry c, push puts a 
new entry on top of the argument stack, pop removes the top entry, and top 
delivers the content of the top entry, while not affecting the argument stack. ^ 
Thus, only the top entries of the stacks can be affected by these operations. 
As we will see below, this is sufficient for our purposes. 

As indicated above, DFA-stacks are an abstract version of the run-time 
stacks of run-time systems, which are used for maintaining the activation 
records of different procedure incarnations. Intuitively, the top entry of a 
DFA-stack contains the data flow information corresponding to the currently 
valid activation record,^ while the data flow informations of the remaining 
stack entries correspond to activation records of preceding but not yet finished 
procedure calls. In contrast to a concrete run-time stack, however, whose en- 
tries are organized by means of static and dynamic link chains, and where 
variables being global for the currently activated procedure are accessed by 
means of the static link chain, the entries of a DFA-stack are assumed to 
contain all the information concerning the current procedure incarnation, 
i.e., also the information related to global variables. The usual static and 
dynamic link chains are just a technical means for enhancing the efficiency of 
run-time systems. In our abstract framework DFA-stacks of potentially un- 
limited size occur only in the interprocedural meet over all paths approach. 
This approach, however, is only conceptually important in order to define 
the specifying solution of an IDFA-problem. In the practically relevant in- 
terprocedural maximal fixed point approach only DFA-stacks of at most two 
entries arise (cf. Remark 8.3.1). This allows us to omit modeling link chains. 
Moreover, this also allows us to work with local semantic functions, which 
only affect the top entries of DFA-stacks. 



^ A definition of (DFA-) stacks in terms of an abstract data type can be found e.g. 
in [Gu]. 

^ The operation newstack instead of the usual emptystack : STACK is consid- 

ered here in order to exclude empty stacks, which are irrelevant in our framework. 
® Therefore, we are never dealing with empty stacks. 
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8.2 Local Abstract Semantics 

Basically, the local abstract semantics of an interprocedural flow graph G* 
is given by a local semantic functional 

I J':7V*^(C^C) 

as in the intraprocedural case. It gives meaning to every node n of G* in 
terms of a transformation on C. Without loss of generality we assume that 
every node n € { start (G), end (G) \ G G S} is associated with the iden- 
tity on C, i.e., fn]' = Idc- Moreover, we assume that the node uq of Go 
representing the fictitious procedure ttq of the underlying program U , is 
associated with the least element of the set of functions on C, denoted by 
[C^C], i.e., |no ]' = -L[c^C], where _L[c ^ C](c)=d/ -L for all c G C. In- 
tuitively, this means that we do not assume anything about the external 
procedures occurring in the environment of 77. Note that the effects of vari- 
able aliasing caused by reference parameters are not explicitly treated at this 
stage. At the current level of detail, they are encoded in the semantic func- 
tions defined above. They must be made explicit only, when considering a 
concrete application (cf. [KRS4]). 

Next we define a second semantic functional, which works on DFA-stacks. 
It is induced by a lattice C, a local semantic functional | ]\ and a return 
functional TZ : TV* ^ (C x C ^C), which is described below, and, intuitively, 
models the effect of returning from a procedure call: 

I f : TV* ^ {STACK STACK) 

This functional is defined as follows: 

ynGN*VstkG STACK. In j*{stk)=df 

{ push{pop{stk), {nf {top{stk))) if n € TV*\(TV*U TV*) 

push{stk, Inf {top{stk))) if n G Nf 

push{pop{pop{stk)), 7l{n){top{pop{stk)), lnf{top{stk))) ) 

if n G TV* 



The intuition behind its definition is as follows: 

The execution of an ordinary statement (i.e., n G TV*\(TV*U TV*)) affects 
only the currently valid activation record. Thus, it can be modeled by sim- 
ply modifying the top entry of the stack representing the current data flow 
information. 

A procedure call (i.e., n G TV*) requires the generation of a new activation 
record. This is reflected by pushing a new element on the top of the stack, 
which results from modifying the top entry of the stack according to the 
parameter transfer. 

The treatment of return statements (i.e., n G TV*) demonstrates the ne- 
cessity of introducing stacks into the framework. Returning from a procedure 
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call (i.e., n G N*) requires to remove the activation record belonging to 
the called procedure and to reactivate its predecessor. Thereby the following 
observation is important. The effect of a (directly) recursive procedure to 
a global variable needs to be maintained, whereas the local variables must 
be reset to their values at call time. Thus we need to consider the data 
flow information valid immediately before entering the procedure ( available 
in top{pop{stk )) ), as well as the information valid after executing its body 
(available in Inf {top{stk))), in order to compute the data flow informa- 
tion that is valid after returning from the called procedure. The function 
TZ{n) : CxC ^ C models this computation. Thus, popping the top entry of 
the stack and replacing the subsequent entry by 

7^(n)( top{pop{stk)), I n f {top(stk))) 

reflects the whole process of completing a procedure call. Note that the func- 
tions |n]* for nodes n G N* are only defined for stacks with at least two 
components. This fact is automatically taken care of in any reasonable anal- 
ysis context. 

8.2.1 The Structure of the Semantic Functions 

Let T=df [ STACK STACK ] denote the set of all functions from STACK 
to STACK, let STACK >2 denote the set of all DFA-stacks with at least two 
entries, and let 

J' 0 =df { / G IF I V stk G STACK. pop{f{stk)) =pop{stk) } 

^c=df { / € IF I V stk G STACK. pop{f{stk)) = stk } 

^R=df { f G T \ W stk £ STACK> 2 - pop{f{stk)) = pop{pop{stk)) } 

Then we have: 

Lemma 8.2.1. 

1. ynG N*\{N*UN*). \nf G To 

2. WnGN*. Inf G To 

3. WnGNf [n f gTr 

Intuitively, this means that the semantic function of an ordinary statement 
only affects the top entry of the argument stack, that the semantic function 
of a call statement adds a new top entry to the argument stack, and that a 
return statement replaces the upper two entries of the argument stack by a 
new component. As a consequence we have the following lemma. 

Lemma 8.2.2. \f fr G Tr \f fo, fo G Tq 'ifc^Tc- fo°fo, frofoofc& 

To 
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Next we introduce derived notions of monotonicity and distributivity for 
functions on stacks. They are required for the formal development of our 
framework, and are based on the “significant part” of a function of To, Tc 
and Tyi- Given a function / G Tq^ Tr, its significant part fs, is defined 
according to the following two cases: 

— f & To^ Tc'- then fs : C^C is defined by: 

fs{c)=df top{f (newstack{c))) 

“ / G iFn: then fs : C x C is defined by:^ 

/s(ci, C2)=d/ top{f{push{newstack{ci),C 2 ))) 

Now we can define: 

Definition 8.2.1 (S- Monotonicity, S-Distributivity). 

A function f G Tc^ Tr is called 

1. s- monotonic if and only if fs is monotonic 

2. s-distributive if and only if fs is distributive 

Intuitively, the significant part of a local semantic function | n ]* is given by 
the underlying basic local semantic function | n ] , and, in case of a return 
node, by the return function TZ(n). Thus, we have: 

Lemma 8.2.3. For all n G N* we have that |n]* is s-monotonic (s- 
distributive) if 

n G Nf : |n]^ and TZ{n) are monotonic (distributive) 

n ^ Nf : |n]^ is monotonic (distributive) 

This lemma is important because it shows that the effort for checking the 
premises of the Interprocedural Correctness Theorem 8.4.1 and the Inter- 
procedural Coincidence Theorem 8.4.2 is comparable to the effort necessary 
for checking the premises of their intraprocedural counterparts (cf. Section 
2.2.4). In fact, only the return functions TZ(n), n G Nf, must additionally 
be investigated. 

Conventions. In the following we consider s-monotonicity and s-distributi- 
vity generalizations of the usual monotonicity and distributivity by identi- 
fying lattice elements with their unique representations as one-entry stacks. 
Moreover, we extend the meet operation H to work on stacks in the following 
way: 

y STK C STACK . \~\STK =df newstack{V){top(stk) \ stk G STK}) 

Thus, the meet over a set of stacks is the one-entry stack containing the meet 
of all the top entries in its single entry. 

Note that C x C is a lattice, whenever C is. 



4 
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8.3 Global Abstract Semantics 



The global abstract semantics of an interprocedural flow graph G* and its 
underlying flow graph system S results from the interprocedural extensions 
of the meet over all paths approach and the maximal fixed point approach, 
respectively. 



8.3.1 The Interprocedural Meet Over All Paths Approach 

Like its intraprocedural counterpart, the interprocedural meet over all paths 
(IMOP) solution records the effect of all possible program executions reaching 
a particular program point. To this end the local abstract semantics | ]* 
must be extended to cover finite interprocedural paths. For every path p G 
IP[m,n], we define |pf : STACK ^ STACK by 

|T f IdsTACK if p = £ 

d/ I |p[2, Ap] ]* o |pi ]* otherwise 



Considering an interprocedural path p G IP[s*,n[ and a DFA-stack stk G 
STACK, it is important to note that the data flow information, which is rel- 
evant for node n after executing p, is given by the element top {{pj* {stk)). 
In fact, all other entries of |p ] {stk) correspond to activation records which 
are invisible after p. Identifying one-entry stacks with the content of their sin- 
gle entry, the interprocedural meet over all paths solution is formally defined 

by:" 

Definition 8.3.1 (The ZMQP-Solution). 

Given an interprocedural flow graph G* , a complete semi-lattice C, a local 
abstract semantics | ]^, and a return functional TZ, the ZMQP-solution is 
defined by:^ 

Vc, GCWnGN* .IMOP^i F.c,) W 

=,fN-IMOP^l r,cMr^),X-IMOP^^ 

where 

N-IMOP(j Y ^cs)i.'n)=df n { \p}* {newstack{cs)) |pGlP[s*,n[} 

X-IMOP(i Y^cs){n)=df n { IpI* { newstack{c.s)) |pGlP[s*,n] } 

Identifying a node n G Nf with the set of nodes {nc{i) |i G callee{n)}, 
Definition 8.3.1 yields: 

N-IMOP^^Y,cAn) 

=df n{ IpI* { newstack{cs)) | pGlP[s*, nc{d)[ A iG callee{n) } 

® As in the intraprocedural case, “N” and “X” stand for entry and exit of a node 

n. 

® Recall that C, | ]^ and TZ induce the local semantic functional [ ]*. 
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=df n{ lpj*{newstack{cs)) | pGlP[s*, nc(t)] A tG callee{n) } 

This is important for nodes of particularly, when dealing with back- 
ward analyses, which are treated essentially by inverting the flow of control 
in our framework (cf. Section 8.7). 



8.3.2 The Interprocedural Maximal Fixed Point Approach 

In contrast to a flow graph of Chapter 2, which represents a single procedure 
without procedure calls, a flow graph system, which represents a complete 
program with procedures and procedure calls, needs a preprocess for de- 
termining the meaning of call nodes in terms of the meaning of the called 
procedures. This requires the introduction of an auxiliary semantic functional 
I I, which gives meaning to whole flow graphs. Intuitively, | n | transforms 
data flow information, which is assumed to be valid at the entry of the pro- 
cedure containing n, into the corresponding data flow information, which 
is valid before an execution of n. The function | | is then the meaning 

function of the i-th procedure.^ Formally, the full preprocess for determining 
the meaning |n] of call nodes n G Nf is characterized by: 

Definition 8.3.2 (Global Semantics of Flow Graphs). 

I 1 : ^ {STACK ^ STACK) and |1 : ^ {STACK ^ STACK) are 

defined as the greatest solution of the equation system given by: 

1^1^/ ^dsTACK if n G {sq, . . . ,Sfe } 

^ ^ 1 n{| TO ] o I TO I I TO G pred/g(„)(n)} otherwise 

and 

{ \n\* if n G N'^\Nf 

= \ n{|nfl(0ro| end{i) I o I nc{d) 1* | tG callee{n) } otherwise 

where Mstack denotes the identity on STACK , and H the “component- 
wise” meet operation on Tq^ 

Intuitively, the semantics of a procedure call node n G Nf is the meet over 
the effects of all procedures in callee{n). For a single procedure l G callee{n) 
the effect of calling l is computed in three steps, which reflect the three 
phases of executing l: 

Recall that [ ei f=df Ida- Thus, we have: [ ] o | ei | = | ei |. 

To-fnf'=dff" G To withVstfc G STACK.top{f"{stk)) = top{f{stk))n 
top{f' {stk)). As usual, “fl” induces an inclusion relation “ C” on To by: / C /' 

iff/n/' = /. 
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— Entering the called procedure: |nc(t) ]* creates a new activation record 
by transforming the content of the top entry of the stack according to the 
semantics of the call node and by pushing it onto the stack. ~ Usually, the 
semantics of call nodes reflects the parameter transfer. 

— Evaluating the call: | end{i) | computes the effect of the procedure body. 
Note that this affects the top entry of the argument stack only. 

— Leaving the called procedure: \nn{L) ]* removes the activation record of 
the current procedure call by popping the top entry from the stack, and 
replacing its subsequent entry by the data flow information representing 
the effect of the procedure call relatively to its call site. 

Applying Lemma 8.2.2, we obtain: 

Lemma 8.3.1. VuGN'®. |n|, \n\ & To 

After fixing the meaning of call nodes, the functional | ] plays essentially 
the same role as the local (abstract) semantic functional of Section 2.2.3. 
Formally, the interprocedural maximal fixed point approach is characterized 
by Equation System 8.3.3. Similar to its intraprocedural counterpart, this 
approach labels every node n of N"® with a pre-information pre(n) and a 
post-information post(n), whose top entries are the greatest solution of this 
equation system with respect to a start information Cg G C. 

Equation System 8.3.3. 

{ newstack{cs) 

n {lmclfg[n)) f (pre(m)) | 
m G caller {fg{n))} 
n { post(m) I m G predfg(^n){ri) } 

post(n) = |n](pre(n)) 

Identifying stacks with a single entry only with the content of this entry, and 
denoting the greatest solution of Equation System 8.3.3 with respect to the 
start information Cg by pre^^ and post^^ , respectively, we define analogously 
to the intraprocedural case: 

Definition 8.3.4 (The /MFP- Solution). 

Given a flow graph system S, a complete semi-lattice C, a local abstract 
semantics | ]^, and a return functional TZ, the /MEP-solution is defined by:^ 

VcgGCVnGN^. 

r.cdW =df{N-IMFP^^ jy,^)(n), X-IMFP^^ jy,^)(n) ) 

where 

N-IMFP(i r,cA(»^)=<i/ Prsc,(n) and X-/MEP(| j._^^)(n)=d/ posU^(n) 



if n = Si 

if ne {so,S2,...,Sfe} 
otherwise 



® Recall that C, | ]^ and TZ induce the local semantic functional [ ]*. 
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Remark 8.3.1 (Limited Stack Size in the IMFP -Approach). 

Equation System 8.3.3 shows that the /MFP-solution is based on a one- 
entry stack attached to the start node Si of the argument program. This is 
important because together with Lemma 8.3.1, this guarantees that all stacks 
occurring during the iterative computation of the ZMFP-solution have at 
most two entries (cf. Algorithm 8.5.2). This allows us to prove termination in 
the usual way. It also dramatically contrasts the ZMQP-approach defining the 
specifying solution of an IDFA-problem. The size of DFA-stacks contributing 
to the /MQP-solution is not limited in general. 

In analogy to the intraprocedural case, we consider the /MQP-approach as a 
means for the direct specification of an IDFA, and the /MFP-approach as its 
algorithmic realization. Consequently, this rises the questions of correctness 
and precision, which are answered by means of the Interprocedural Correct- 
ness Theorem 8.4.1 and the Interprocedural Coincidence Theorem 8.4.2. 

8.4 ZMQP-Correctness and ZMQP-Precision 

In this section we present the interprocedural versions of the Correctness and 
the Coincidence Theorem 2.2.1 and 2.2.2. In analogy to their intraprocedural 
counterparts, the Interprocedural Correctness Theorem 8.4.1 and the Inter- 
procedural Coincidence Theorem 8.4.2 give sufficient conditions for the cor- 
rectness and the precision of the /MFP-solution with respect to the /MQP- 
solution, respectively. Central for proving these theorems is the Main Lemma 
8.4.3, which is presented in the following section. 

8.4.1 The Main Lemma 

Throughout this section we assume that the semantic functions | ]*, n G N*, 
are s-monotonic or s-distributive. Obviously, the composition and the meet of 
s-monotonic (s-distributive) functions is again s-monotonic (s-distributive). 
Thus we have: 

Lemma 8.4.1. The semantic functions |n] and |n|, n G , are s- 
monotonic {s- distributive) iff the semantic functions |m] , m G N* , are 
s-monotonic (s-distributive) . 

Next we define for every node n of G* a mapping imop^^ : STACK STACK 
by 

_ J IdsTACK if G {Sg, . . . ,s^ } 

imop^ d/ ri{ |p ]* I p G CIP[start(/g(n)), n[ } otherwise 

We have: 

Lemma 8.4.2. For all n G N®, we have, if the semantic functions |m]*, 
TO G N*, are 
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1. s-monotonic: |n| Cl imop^ 

2. s- distributive: | n | = imop^ 

Proof. The first part of Lemma 8.4.2, |n| C imop„, is an immediate 
consequence of the formula 

(*) y n G y p G CIP[start{fg{n)),n[. |n| C |p1* 

which we prove by induction on the length k of path p. Paths of length 0 
only reach nodes in {sq, Si, . . . , s^} for which (*) is trivial. Moreover, paths 
of length 0 are the only complete interprocedural paths ending in a node 
s G {sq, Si, ..., Sfe}. Thus, let n G N‘®\{so, Si, . . . , s^} such that there is a 
path p G CIP [start {fg(ji)),n[ with 0 < Xp < k. Then we must show 

IpT 3 |nl 



under the induction hypothesis 

(IH) y m G (y q G ClP[start{fg{m)),m[. 0 < Xq <k). Iqj* 3 |w| 

Let m G pred*{n) and p' G IP [start {fg{ri)),'rn[ such that p = p';{m). If m G 
N*\{N*U N*), we have m ^ Nf, and p' G ClP[start{fg{n)),m[. Therefore, 
we obtain as desired: 



bf 

(to ^ Nf) = 

(IH, s-monotonicity) □ 

(Definition 8.3.2 and n ^ {sq, Si, . . . , s^}) □ 



|to1* o Ip']* 
|to 1 o Ip'i* 
|to 1 o |to| 

bl 



On the other hand, to G N* cannot occur because this would imply n G 
{sq, Si, . . . , Sfc} in contradiction to the choice of n. Thus we are left with the 
case TO G N*, which is problematic because here to is not the predecessor of 
n in N'^, rather it is the return node to/?(/,) for some node to G pre<ijg(„)(n) 
with t G callee{fh). This situation, which does not allow the direct application 
of the induction hypothesis, can be pictured as follows: 



TOc(0 

TO = TOfi(r) 



G* 

A ^ 


( ^ 

1 T 




S 

A " ' > 




1 


n start ( l) 


ffl 






> 



T,d{i) 



Figure 8.4.1. 
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Identifying in the situation of Figure 8.4.1 the nodes and m and 

considering the decomposition of path p with 

p = p; (mc(t)); (start{t)), end{t)); (mfi(t)) 

we obtain by means of Lemma 7.2.2 

p e ClP[start{fg{n)),fh[ 



and 



{start{i ), . . . , end{L)) € CIP[start(i), end{C)] 



The first inclusion can therefore be completed byd*^ 



bf 


= 


I Wfl(t) 1* o 1 {start(L), end{L)) ]* o | mc(i) F ° IpF 


(IH, s-mon.) 


□ 


I Wfl(i) F o 1 end{L) 1 o{mc{i) F ° [pF 


(Def. 8.3.2) 


□ 


|m] o IpF 


(IH, s-mon.) 


□ 


|m] o |m| 


(Def. 8.3.2) 


□ 





In order to prove the second part of Lemma 8.4.2, we anticipate Algorithm 
8.5.1 and Theorem 8.5.1. This, however, does not cause any subleties as they 
are independent of the results of Section 8.4. The second part of Lemma 8.4.2, 
|n| □ imop^j is then a consequence of 

(**) Vfc>0VneN‘®. (1) imop„ Q gtr^[n] and (2) |n] □ ltr^[n] 

which is proved by (simultaneous) induction on the number k of times the 
while-loop in Algorithm 8.5.1 is executed. 

The induction base is an immediate consequence of the initialization part 
of the algorithm. Thus let k>0, and let us consider the fc-th iteration of the 
while- loop under the induction hypothesis (IH) that (**) holds before the k- 
th execution of the while- loop. To show (1), we only need to consider the node 
n G N"®, which belongs to the workset entry (n, /) being processed in the 
k-th iteration because the gtr-information associated with all the other nodes 
remains unchanged in this iteration and therefore satisfies (**) according to 
(IH). 

If n = s G { So, . . . , Sfc }, we have / = Mstack, since 
{ (s, Mstack) | sG {sq, . . . ,Sfc} } 

is initially a subset of the workset and nodes in { Sq, . . . , } are never added 

again to the workset. Thus, we have as desired 

imop^ = IdsTACK = gtr'^ [s] 

Remember that | end{i) ] = Ustack ■ 

We remark that ltr^[n] and gtr^[n] denote the values of ltr[n] and gtr[n] after 

the fc-th execution of the while-loop of Algorithm 8.5.1, respectively. 
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This leaves us with the case n G N‘®\{so, . . . , Sfc}. Thus there exists m G 
and k> I > 0 such that the pair 

(n, /) = {n,Ur''[m\ o gtr''[m]) 



was added in the ?-th iteration to the workset.^^ This allows us to deduce 
imop„ C / by investigating the following two cases: 

Case 1. m ^ Nf 
Here we have: 



imop^ = 

(to ^ Nf) C 

(s-distributivity) = 

(Def. imop^) = 

(IH, s-monotonicity) C 
(to ^ Nf) = 



n{ IpT Ip G ClP[start{fg{n)),n[} 
ri{|TO]*o|p]*|pG ClP[start{fg{n)),m[} 
|to]* o □{ |pf Ip G CIP[stort(/(/(n)),TO[} 
|to]* oiTOOp^ 

I TO ]* o gtr''[m] 

Itr^ [ to ] o gtr’' [ to ] = / 



Case 2. to G Nf 

In this case we obtain by means of the s-distributivity and the s-monotonicity:^^ 



tTOop„ = 

(to G Nf) C 

(IH) C 

(Prop. 8.5. 1(3)) = 



n{ |pf Ip G ClP[start{fg{n)),n[} 

n{iTOK(r)roip'ro[TOc(oroipr 

I i G callee{m),p' G CIP[start(i), end(t)], 
p G ClP[start{fg{m)),m[} 
□ { |TOfl(t) ]* o gtr'^[end{L)] o |TOc(t) 1* ° 

I L G callee{m) } 



Itr'^ [ to ] o gtr^ [ to ] = / 



Thus, in both cases we uniformly have imop^ C /. Together with the induc- 
tion hypothesis this yields inclusion 

(1) imopn C ptr^“^[n]n/ = gtr'^[n]. 



Also the remaining inclusion 
(2) [n'l C Urgin'], n' G N^ 

requires the investigation of two cases depending on the nature of the node 
n G N"® being processed in the k-th iteration. 

Case 1. n ^ {eo,...,efc}. This guarantees that Ur remains unchanged in 
the fc-th iteration of the while-loop. Thus |n'] C ltr’^[n'], n' G N'^, is a 
direct consequence of the induction hypothesis. 



^^1 = 0 means that (n, /) was added during the initialization of the workset. 
Recall that [ e ] = Mstack for all nodes e G {eo , . . . ,ek}. 
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Case 2. n G {eg, . . . , efc}. In this case we have 

ltr'^\l]= / n llR{f9{n))T°9tr’^[n]ollc{fg{n)) ]*if I G caller{fg{n)) 

\ otherwise 

Thus I n' ] C ltr^[n'\ is a direct consequence of the previously proved inclu- 
sion imop^ C gtr^[n], Definition 8.3.2, and the induction hypothesis. 

□ 

By means of Lemma 8.4.2 we can now prove the Main Lemma 8.4.3, which 
is central for proving the Interprocedural Correctness Theorem 8.4.1 and the 
Interprocedural Coincidence Theorem 8.4.2 (cf. Section 8.4.2 and 8.4.3). In 
fact, by means of the Main Lemma 8.4.3 their proofs proceed straightforward 
almost as in the intraprocedural case (cf. [Kil, Ki2]). 

Lemma 8.4.3 (The Main Lemma). 

For all n G Nf, we have, if the semantic functions |m]*, m G N* , are 

1. s-monotonic: |n] C V^{\p\* \ l G callee{n) , p G CIP[nc(t), 'n-fl(t)] } 

2. s- distributive: |n] = V^{\p\* \ l G callee{n) , p G CIP[nc(t), n/j(i)] } 



Proof. Let n G Nf. The first part of Lemma 8.4.3 follows from the subse- 
quent sequence of inequations: 



(s-mon.) □ 

(L. 8.4.2(f)) □ 
(Def. 8.3.2) = 



ri{ |p]* I i G calleein), p G CIP[nc(i)) «fl(0] } 
n{ ri{ |p]*|pGCIP[nc(t))«K(0] } I G callee{n) } 
n{ [nfi(r)]*ori{ |p]*|pGCIP[stort(i), end(r)] }o|nc(i)]* 

I L G callee{n) } 

ri{ I nfi(r) ]* o I end(i) | o | nc(t) 1* | i G callee{n) } 

Inj 



The second part of Lemma 8.4.3 can be proved analogously: 



(s-distr.) 

(L. 8.4.2(2)) 
(Def. 8.3.2) 



ri{ |p]* I i G calleein), p G CIP[nc(i)) «fl(0] } 
n{ ri{ |p]*|pGCIP[nc(t))«fl(0] } I € calleein) } 
n{ |nfl(r)]*ori{ |p]*|pGCIP[start(i), end(r)] }o|nc(i)F 

I L G calleein) } 

ri{ I nfi(t) ]* o I endii) | o | ncit) 1 * U G calleein) } 

Inj 



□ 



8.4.2 The Interprocedural Correctness Theorem 

Similar to its intraprocedural counterpart, the ZMFP-solution is a correct ap- 
proximation of the ZMQP-solution, whenever all the local abstract semantic 
functions are s-monotonic: 
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Theorem 8.4.1 (Interprocedural Corrrectness Theorem). 

Given a flow graph system S = {Go, G\, . . . ,Gk) and its derived interproce- 
dural flow graph G* = {N* , E* ,s* ,e*), the IMFP -solution is a correct ap- 
proximation of the IMOP -solution, i.e., 

Vc, G C Vn e N^. IMFP^i r.c.)H E IMOP^^ r,c,)W 

if all the semantic functions |n]*, n G N* , are s-monotonic. 

Proof. In order to prove Theorem 8.4.1, it is sufficient to prove for all Cs G C 
and n G N'^ the inclusion 

fV-/MEP(i J.,,^)(n) c N-IMOPa r.c.) W (8.1) 

because the validity of the second inclusion 

X-/MEP(i jy,^)(n) C (8.2) 

follows immediately from (8.1), Definition 8.3.2, Equation System 8.3.3, and 
the Main Lemma 8. 4. 3(1). 

Throughout the proof we abbreviate N-IMFP^j^ l*,cs) by N-IMFPc^, and 
prove (8.1) by equivalently showing for all Cg G C: 

(*) Vn G N'^ Vp G IP[s*,n[. N-IMFPcfln) C \p\* {newstack{ca)) 

Formula (*) is now proved by induction on the length k of path p. The case 
k = 0 follows from the sequence of equations 

Ip}* {newstack{cs)) = {e}* {newstack{cs)) = newstack{cs) = pre„,^(si) 

Hence, let k>Q and assume that (*) holds for all paths q with Xq < k, i.e., 

(IH) VnGN'^(V( 7 GlP[s*, n[. 0 < Ag < k)N-IMFPcfln) C \q\* {newstack{cs)) 

Now, given a node n G N"®, it is sufficient to show for each path p G IP[s*, n[ 
with Xp = k: 

N-IMFPcfln) Q lp}*{newstack{cs)) 

Without loss of generality, we can thus assume that there is such a path p, 
which obviously can be rewritten as p = p';{m) for some m G pred*{n) and 
p' G IP[s* ,m[. Moreover, for m ^ Nf we have by induction hypothesis 

N-IMFPcflm) C \p' {newstack{cs)) 

We are now left with the investigation of three cases in order to complete the 
proof of formula (*). 

Case 1. m G N*\{Nf\jNf) 

In this case we succeed straightforward by considering the following inclusions 
resulting from Equation System 8.3.3: 
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N-IMFPc, (n) = n { post^^ (0 U G predfg(n) {n) } 

{m € predfg(n){n)) C post^^(m) 

(m^Nf) = {mf{N-IMFPcXm)) 

(IH, s-monotonicity) □ \m\* {\p' \* {newstack{cs))) 

{p = p'\{m)) = \p}* {newstack{cs)) 

Case 2. m G N* 

In this case we have n G {sq, . . . ,Sfc}, and the existence of a node fh G Nf 
corresponding to m = rhc{fg{n)) in S. Thus we obtain as above: 

N-IMFPM = nil Icifgin)) f(pre,^(0) 1 1 G caller {fg{n))} 

{m e caller if g{n)) C | me (/^(n)) ]*(pre^^ (to)) 

= I mc{fg{n)) f{N-IMFP^^{fh)) 

(IH, s-monotonicity) C lfhc{fg{n))]*{lp'}*{newstack{cs))) 

(p = p';(to)) = lpl*{newstack{cs)) 

Case 3. TO G N* 

Here, there exists a predecessor to G Nf of n in S' corresponding to to, 
i.e., fhii{fg{m)) = to. Considering now the following decomposition of path p 

p = p;{mc{fg{m));{start{fg{m)),...,end{fg{m)));{rhR{fg{m))) 

and identifying to and fhc{fg{m)), the induction hypothesis yields 

N-IMFPc,{mc{fg{m))) C {pj* {newstack{cs)) 

Thus, together with Equation System 8.3.3 we obtain as desired: 

N-IMFPc, in) = n { post^^ (0 U G predfg^ri) in) } 

{rh G predfg(n){n)) C post^^(TO) 

= {m\{N-IMFP^X'fnc{fg{m)))) 

(IH, s-monot.) C lmj{lpj*{newstack{cs))) 

(M.L. 8.4.3(I)) E n{lp'r{lpf{newstack{cs)))\ 

p' G ClP[mc{fg{rn)),mR{fg{rn))] } 

(Lemma 7.2.2) C [fhR{fg{rri))\* {[{start {fg{rri)), erid{fg{rn)))f 

{{'fnc{fg{m)) l*{[pl*{newstack{cs))))) 

This completes the proof of the Intraprocedural Correctness Theorem 8.4.1. 

□ 

8.4.3 The Inter procedural Coincidence Theorem 

Like its intraprocedural counterpart, the /MTP-solution is precise for the 
ZMQP-solution, whenever the local abstract semantic functions are s-distribu- 
tive: 



14 



The situation is illustrated in Figure 8.4.1. 
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Theorem 8.4.2 (Interprocedural Coincidence Theorem). 

Given a flow graph system S = {Go, G\, . . . ,Gk) and its derived interproce- 
dural flow graph G* = {N*,E*,s*,e*), the IMFP -solution is precise for the 
IMOP -solution, i.e., 

ycs e C Vn G N^. J.,,^)(n) = /MQP(J j.,,^)(n) 

if all the semantic functions |n]*, n G N* , are s- distributive. 

Proof. The first inclusion, “ C ”, is a consequence of the Interprocedural 
Correctness Theorem 8.4.1. In order to prove the second inclusion, “ 3 ”, we 
have to show for all G C and n G N'^ the following two inclusions: 

7V-/MEP(i □ 7V-/MQP(t (8.3) 

X-/MEP(i j.,,^)(n) □ X-IMOP([ j.,,^)(n) (8.4) 

In order to prove (8.3) and (8.4) we anticipate Algorithm 8.5.2 and Theo- 
rem 8.5.2. This does not cause any problems because they do not rely on 
the results of Section 8.4. Moreover, we abbreviate N-IMOP(j |*,cs) 
AT-/MQP(| I* throughout the proof by N-IMOPc, and X-IMOPc, in 

order to simplify the notation. Now (8.3) and (8.4) will be proved by proving 
by induction on k the equivalent formulas:^® 

V/c>0VnGN®. N-IMOPcfln) C pre'^[n] (8.5) 

V/c>0VnGN^. X-IMOPcfln) Q post’^ln] (8.6) 



Actually, it is even sufficient to prove (8.5) because of the following two facts. 
First, for every node n G N"® we have: 



(Definition 8.3.1) 
(Def. 8.3.2, M.L.8.4.3(2)) 
(Lemma 8.4.1) 
(Definition 8.3.1) 



X-IMOPcfln) 

n {lpj*{newstack{cs)) \p G IP[s*,n] } 

^ {Injdpj* {newstack{cs))) \p G IP[s*,n[} 
I’^Kn {lpj*{newstack{cs)) \p G IP[s*,n[}) 
|nl(fV-/MQP,,(n)) 



Second, an investigation of Algorithm 8.5.2 yields: 

V/c > 0 Vn G N'^. post’^[n] = |n](pre^[n]) 



Combining these facts with the distributivity of the local semantic functions, 
(8.5) implies (8.6) as desired. 



We remark that pre^[n] and post*^[n] denote the values of pre[n] and post[n] 
after the fc-th execution of the while-loop of Algorithm 8.5.2, respectively. 
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In the proof of (8.5) the case A: = 0 is trivial according to the initial- 
ization of pre[n], n G N'^, with newstack{T). Thus let A:>0, and let us 
consider the fc-th iteration of the while-loop under the induction hypothesis 
(IH) that (8.5) holds before the k-th execution of the while-loop. Then we 
only need to consider the node n G N"®, which belongs to the workset en- 
try (n, stk) which is processed in the /c-th iteration because the information 
associated with all the other nodes remains unchanged in this iteration and 
therefore satisfies (8.5) according to the induction hypothesis (IH). If n = Si, 
we have stk = newstack{cs), since {si, newstack(cs)) is initially contained in 
the workset, and Si occurs only once in the workset. In this case we obtain 

N-IMOPcg{s*) C Is]* (newstack{cs)) = newstack(cs) = pre[si\ 

This leaves us with the case n G N‘®\{si}, which yields the existence of a 
node m G N'^, whose execution in the l-th iteration caused (n, stk) to be 
added to the workset. Thus we haved® 

gff. ^ / l'm'C{fgin))]*{pre'^[m]) if n G {sq, Sa, . . . , sj,} 

\ I 1 {pre^ [m]) otherwise 

Now the inclusion N-IMOPc^{n) C stk is proved by investigating three 
cases: 

Case 1. m G pre<ijg(„)(n)\Nf 
Here we succeed straightforward: 

N-IMOPc^(n) = ri{|p]*(new;stocA:(cs)) Ip G IP [s*,n[} 

(to G pred/g(„)(n)\Nf) C ri{| to ]*(| p ]*(newstocA:(cs)))|p G IP[s*, to[} 
(s-distributivity) = |to ]*([!{ |p ]*(newstacA:(cs))|p G IP[s*, to[}) 

(Definition of N-IMOP) = | to ]*(7V-/MQPcs (^)) 

(to ^ Nf) = \m]{N-IMOPcX'm)) 

(IH, s-monotonicity) C |TO](pre^[TO-]) = stk 

Case 2. to G predjg(„)(n) n Nf 
Here we obtain as required: 

N-IMOPc^(n) = ri{|p]*(new;stacA:(cs)) Ip G IP[s*,n[} 

E ri{ |p ]*(|p ]*(newstacA:(cs))) |pGIP[s*,to[ A 

iG callee{m), pG CIP[TOc(i), TOfl(r)] } 
(s-distr.) = n{|p]*(ri{ |p]*(new;stocA:(cs)) Ip G IP[s*,to[}) I 

r G callee{m), pG CIP[TOc(i), TOfl(r)] } 
(Def. N-IMOP) = n { I p f {N-IMOP c, (to) ) | 

i G callee{m), p G CIP[TOc(t), TOfl(i)]} 

(M.L. 8.4.3(2)) = lm]{N-IMOPcAm)) 

(IH, s-mon.) C |TO](pre^[TO]) = stk 

1 = 0 means that m was processed during the initialization part of Algorithm 

8.5.2. 
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Case 3. m ^ predfg(n)(j>-) 

In this case we have m € caller {fg (n)) and n G {sq, . . . ,Sfc}. Identifying the 
nodes m and rnc{fg{n)) we obtain as desired: 



N-IMOPc, (n) 

C 

(s-distr.) = 

(Def. N-IMOP) = 
(IH, s-mon.) Cl 



{newstack{cs)) \ p G IP[s*,n[} 
nj lmc{fg{n)) ]* Up]* { newstack{cs))) \ 
p G IP[s*,mc(/ 5 (n))[} 
liTic{fg{n)) r(ri{ lp]*{newstack{cs)) \ 
p G IP[s*, mc(/ 5 (n))[}) 

I mc{fg{n)) 1* (N-IMOP c, (m)) 
l'mc(fg(n))f(pre'^[m]) = stk 



Thus, together with the induction hypothesis, we uniformly have in all three 
cases as required: 

N-IMOPc^(n) E pre^~^ [n] n stk = pre’^[n] 

This completes the proof of the Interprocedural Coincidence Theorem 8.4.2. 

□ 

It is worth noting that Lemma 8.2.3 allows us to check the s-monotonicity 
and s-distributivity of the semantic functions | n ] * , which are the premises 
of the Interprocedural Correctness Theorem 8.4.1 and the Interprocedural 
Coincidence Theorem 8.4.2, simply by checking these properties for the se- 
mantic functions |n]^ and the return functions TZ(n). Thus, in comparison 
to the intraprocedural setting the only additional effort arises from check- 
ing the return functions, which is important for applying the framework in 
practice. 



8.5 The Interprocedural Generic Fixed Point 
Algorithms 

Like its intraprocedural counterpart, the ZMFP-approach is practically rel- 
evant because it induces an iterative procedure for computing the IMFP- 
solution. Interprocedurally, however, the computation proceeds in two steps. 
First, a preprocess computing the semantics of procedure call nodes, which 
is described in Section 8.5.1, and second, the main process computing then 
essentially as in the intraprocedural case the /MFP-solution as described in 
Section 8.5.2. Section 8.5.3, subsequently, presents an alternative algorithm 
for the main process, which is more efficient in practice. 

8.5.1 Computing the Semantics of Procedures 

In this section we present a generic algorithm, which computes the semantics 
of flow graphs according to Definition 8.3.2. 
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Algorithm 8.5.1 (Computing the Semantic Functionals | ] and | |). 

Input: A flow graph system S = {Go, G\, . . . ,Gk), a complete semi-lattice 
C, a local semantic functional | ]^ : N* ^{C^C) with |n]^ = /dc for all 
n€ { start{G), end{G) \ GgS}, and a return functional TZ : N* ^ {C xC ^ C). 

Output: An annotation of S with functions | n | : STACK STACK 
(stored in gtr, which stands for global transformation), and | n ] : STACK 
STACK (stored in Hr, which stands for local transformation) representing 
the greatest solution of the equation system of Definition 8.3.2. 

Remark: The lattice C, the semantic functions |n]^, n G N* , and 

the return functions TZ{n), n G Nf, induce the local semantic functions 
|n]*, n G N* , working on stacks of C. T : STACK ^ STACK G Tq 
denotes the “universal” function which is assumed to “contain” every func- 
tion / G To, and Mstack denotes the identity on STACK. The variable 
workset controls the iterative process. Its elements are pairs, whose first com- 
ponents are nodes m G of the flow graph system S, and whose second 
components are functions / : STACK STACK G To, which specify a new 
approximation for the function |m| of the node of the first component. 
Note that due to the mutual dependence of the definitions of | | and | ] 
the iterative approximation of | | is superposed by an interprocedural iter- 
ation step, which updates the semantics | ] of call nodes. 

( Initialization of the annotation arrays gtr and Itr and the variable workset) 
FORALL TO G DO 

gtr[m] :=Ty^„; 

IF TO G Nf 

THEN ltr[m] := □{ | TOfl(r) f o o I mc{f) 1* I i G callee{m) } 
ELSE ltr[m]-.= |to]* 

FI 

OD; 

workset := { (s, Mstack) \ s G {sq, . . . , s^} } U 

{ (n, /) I n G succfg(rn){'m) A / = ltr[m] o gtr[m] Tq} 

( Iterative fixed point computation) 

WHILE workset yf 0 DO 
LET (to, /) G workset 

BEGIN 

workset := workset\{ (to, /) }; 

IF gtr[m] □ gtr[m] □ / 

THEN 

gtr[m] := gtr[m] □ /; 
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OD. 



IF TO G {ei I i G {0, . . . , k}} 



FI 



THEN 

FORALL I G caller{fg{m)) DO 

ltr[l] := ltr[l] □ | iRifglm)) ]* o gtr[m] o | lcifg{m)) ]*; 

workset := workset U {(n, ltr[l] o gtr[l]) \ n G succfg(^i){l) } 

OD 

ELSE 

workset := workset U 

{ (n, ^^r[TO-] o^tr[TO]) | n G succfg(^ra){fn) } 



FI 

END 



In order to formulate the central property of this algorithm, we denote the 
values of the variables workset, ltr[n] and gtr[n] after the k-th execution 
of the while-loop by workset'^, ltr’^[n] and respectively. We have: 

Proposition 8.5.1. If the semantic functions |n]*, n G N* , are s- 
monotonic, we have: 

1. WnG y k € IN. Itr^ln] G iFo A gtr^[n] G iFo 

2. yn€ y k G IN. {Ur gtr'^~^^[n\) Q {ltr’‘[n\, gtr'^[n\) 

( ri{ I nfi(r) 1 o gtr^[end(i)J o | nc(^) ]* | 

5. Vn G N'^ y k G IN. ltr’^[n] = < t G callee{n) } if n G Nf 

[ |n]* otherwise 

The first part of Proposition 8.5.1 is a direct consequence of Lemma 8.2.2. 
The second part follows from the s-monotonicity of the semantic functions 
I n ] * , n G N* , and the third part can straightforward be proved by an 
induction on the number k of iterations of the while-loop of Algorithm 8.5.1 
(cf. proof of Theorem 8.5.1). By means of Proposition 8.5.1 we can now prove 
the central result concerning Algorithm 8.5.1: 

Theorem 8.5.1 (Algorithm 8.5.1). 

If the semantic functions |n]*, n G N* , are s-monotonic, we have: 

1. If [C^C] satisfies the descending chain condition, there is a ko G IN 
with 



Vfc > fco Vn G N'^. {gtr^[n\, ltr'^[n\) = {gtr^° [n], ltr’^°[n\) 

2. VnGN'^. (|n|,|n]) = {\^{gtr^[n] \ k > 0},\~\{ltr^[n] | fc > 0} ) 

Proof. The first part of Theorem 8.5.1 follows easily by means of Proposition 
8.5. 1(2). In order to prove the second part, let fix-gtr and fix-ltr denote an 
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arbitrary solution of the equation system of Definition 8.3.2 for || and | ], 
respectively. Then the central step is to prove the following four invariants of 
Algorithm 8.5.1, which we simultaneously prove by induction on the number 
k of iterations of the while- loop: 

1. V/cGlNVnS N"®. {fix-gtr[n], fix-ltr{n\) Q {gtr^[n\, ltr'^[n\) 

2. V/cGiNVnG N"®. gtr^[n] □ ri{ / | {n, /) G workset^ } = 

f IdsTACK if n e {sq, . . . ,Sk} 

\ \~\{ltr’^[m]o gtr'^[m] \ m € predfg(n)(ji}} otherwise 

3. V/cGINVuGN®. ltr'^[n] = 

( ri{ I ]* o gtr^[end{L)] o | nc(t) ]* 1 1 G callee{n) } if n G Nf 
|n]* otherwise 

4. V (to, /) G workset^ . fix-gtr[m\ G / 

For fc = 0, we obtain by investigating the initialization part of Algorithm 
8.5.1: 

(a) V n e N'^ . gtr^ [n] = 

(b) Vn G N'^. ltr^[n] = 

f ri{ I ]* o gtr^[end{L}] o | nc{d) ]* 1 1 G callee{n) } if n G Nf 
( |n]* otherwise 

(c) workset° = { (s, Hstack) \ s G {sq, . . . , s^} }U 

{(n, /) \ mnsucCfg(^^){rn)Af = ltr°[m] o gtr°[m] C 

Hence, invariant (1) is an immediate consequence of (a), (5), and Definition 
8.3.2; invariant (2) holds because of (a) and (c); invariant (3) is a con- 
sequence of (6), and invariant (4), finally, holds because of (c). Definition 
8.3.2, and the choice of fix-gtr. 



For the induction step let fc>0 and let (to, /) be the element currently 
chosen from the workset. Obviously, we have: 



(d) gtr^[n] = 



( gtr^ '^[n]U f if n = to A gtr’" ^ [n] □ gtr’" ^[n]n f 
( gtr^~^[n] otherwise 



(e) ltr’^[n] = 



r Ur'" '-[ri] n{riR{fg{rri))f o gtr'"[rri]olric{fg{rri))j* 

I if n G caller{fg{m)) A to G { eg, . . . , } A 

I gtr'"[m] C gtr'"~^[m] 

[ ltr'"~^[n] otherwise 
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Moreover, the induction hypothesis yields: 

(i) Vn G N'^. {fix-gtr[n], fix-ltr[n]) C {gtr'^~^[n], ltr’^~^[n]) 

{ii) fix-gtr[m] C / and 
{Hi) Vn G = 

( ri{ I ]* o gtr^~^ [end{L)\ o | nc(t) 1* 1 1 G callee{n) } 

if n G Nf 

[ I n ] * otherwise 

Together this directly proves invariants (1) and (3). 

During the induction step for the remaining two invariants (2) and (4), 
we can assume 



gtr^[m\ = gtr'^ ^[m] □ / C gtr^ ^[m] 

because otherwise, they trivially hold by induction hypothesis. 

Considering invariant (2) first, the induction step is an immediate con- 
sequence of (d) and the induction hypothesis, if n = m. On the other hand, 
if TO G {eo,...,efe} and n G [J { succfg(^i){l) \l G caller {fg {m)) } , or if 
n G succfg(rn){'nT'), invariant (2) follows from the induction hypothesis and 
the update of the workset during the k — th iteration of the while-loop. For 
the remaining nodes invariant (2) is a trivial consequence of the induction 
hypothesis. 

This leaves us with proving the induction step for invariant (4). Here we 
have to investigate two cases. If to ^ { eg, . . . , e*, }, we have to show 

(*) Vn G sucCfg(j,C){m) . fix-gtr[n] C ltr^[m] o gtr^[m] 

and if TO G { eg, . . . , }, we have to show 

(**) yi G caller{fg{m)) Vn G succfg(^i){l). fix-gtr[n] C Ur^[l] o gtr'^[l] 

According to the choice of fix-gtr and fix-ltr and Definition 8.3.2, we have 
in the first case 

Vn G sncc/g(m)(w). fix-gtr[n] Q fix-ltr[m] o fix-gtr[m] 
and in the second one 

yi G caller{fg{m)) Vn G succfg(i){l) . fix-gtr[n] Q fix-ltHil] o fix-gtr[l] 

Thus, (*) and (**) follow immediately from invariant (1) and the s- 
monotonicity of all semantic functions. This completes the proof of the re- 
maining invariant (4). 

Having now proved all four invariants, the second part of Theorem 8.5.1 is 
a consequence of Proposition 8. 5. 1(2), the invariants (1), (2), and (3), and 
the equations 

n {gtr^[n] | A: > 0} = H { / | k>0 A (n, /) G workset’^ } 
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and 

n{ltr’^[n] |fc>0} = 

f ri{ I nn{L) ]* o gtr^[end{L)] o | nc{t-) ]* 1 1 G callee{n) } if n G Nf 
\ |n]* otherwise 

which hold due to the commutativity and associativity of □. □ 

As a corollary of Theorem 8.5.1 we get: 

Corollary 8.5.1 (Algorithm 8.5.1). 

If the semantic functions |n]*, n G N* , are s-monotonic, we have: 

1. Algorithm 8.5.1 terminates, if [C^C] satisfies the descending chain 
condition. 

2. After the termination of Algorithm 8.5.1 holds: 

VnGN'5'. (|n|,|n]) = {gtr[n],ltr[n]) 



8.5.2 Computing the JMFP-Solution 

In this section we present a generic algorithm which computes the IMFP- 
solution. It is based on the output of Algorithm 8.5.1. 

Algorithm 8.5.2 (Computing the ZMFP-Solution) . 

Input: A flow graph system S = {Go, G\, . . . ,Gk), a complete semi-lattice 
C, the local semantic functional | ]=d/ Itr with respect to C (computed by 
Algorithm 8.5.1), for every node m G Nf the functions |mc(t)l^ t G 
callee{m), and a start information Cs G C. 

Output: An annotation of S with data flow informations, i.e., an annotation 
with pre-informations (stored in pre) and post-informations (stored in post) 
of one-entry stacks which characterize valid data flow information at the entry 
and at the exit of every node, respectively. 

Remark: The lattice C and the semantic functions induce the 

semantic functions |mc(t)]* working on DFA-stacks. newstack{T) denotes 
the “universal” data flow information, which is assumed to “contain” every 
data flow information. The variable workset controls the iterative process. 
Its elements are pairs, whose first components are nodes m G N"® of the 
flow graph system S, and whose second components are elements of STAGK 
specifying a new approximation for the pre-information of the node of the 
first component. Recall that Si denotes the start node of the main procedure. 
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( Initialization of the annotation arrays pre and post, and the variable workset) 

FORALL m G DO 

{pre[m], post[m]) := {newstack(T),lml{newstack{T))) OD; 
workset := { (si, newstack(cs)) } U 

{ (n, stk) I n G succfg(ra)(jn) A stk = post[m] C newstack(T) } U 
{ (n, stk) I m G Nf A t G callee{m) A n = start{i) A 
stk = |mc(t) ]*(pt’e[m]) C newstack{T) }; 

( Iterative fixed point computation) 

WHILE workset yf 0 DO 
LET (m, stk) G workset 

BEGIN 

workset := workset\{ (to, stfc) }; 

IF pre [to] □ pre [to] □ stk 

THEN 

pre [to] := pre [to] □ stk; 
post[m] := |TO](pre[TO]); 

workset := workset U { (n, post[m]) \ n G succfg(^rn)(jn') }i 
IF TO G Nf 

THEN 

workset := workset U 

{ (start (t), |TOc(t) ^(^^[to])) I r G callee{m) } 

FI 

FI 

END 

OD. 



As in the previous section, we denote the values of the variables workset, 
pre[n], and post[n] after the tc-th execution of the while-loop by workset'^, 
pre^[n], and post'^[n], respectively. In analogy to Proposition 8.5.1 and The- 
orem 8.5.1 we can prove: 

Proposition 8.5.2. If the semantic functions |n], nG N"®, and |TOc(t)]*, 
TO G Nf i G callee{m), are s-monotonic, we have: 

1. Vn G N'^ Vtc G IN. pop{pre^[n\) = emptystack A 

pop{post^\n]) = emptystack 

2. Vn G N'^ y k G IN. {pre’^~^^[n],post’^~^^[n\) Q (pre'^[n\, post'^[n\) 

Intuitively, the first part of Proposition 8.5.2 states that the DFA-stacks 
stored in the variables pre[n] and post[n], n G Nf, always have a single 
entry only; the second part states that these values decrease monotonicly. 
Together, this allows us to prove: 
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Theorem 8.5.2 (Algorithm 8.5.2). 

If the semantic functions |n], n G N"®, and |mc(t)]*, m G Nf and 
i G callee{m), are s-monotonic, we have: 

1. If C satisfies the descending chain condition, there is a kg G IN with 

Vfc > /co Vn G N'^. {pre’^[n],post’^[n]) = {pre^° [n], post’^°[n]) 

2. Vn G |*,cs)(’t-) = (ri{pre^[n] | fc > 0}, \~\{post’^[n] | A: > 0} ) 

Proof. The first part of Theorem 8.5.2 is a simple consequence of Proposition 
8. 5. 2 (2). Thus, we are left with proving the second part. To this end let 
fix-pre and fix-post denote an arbitrary solution of Equation System 8.3.3 
with respect to newstack(cs) G STACK. As in the proof of Theorem 8.5. 1(2), 
the essential step is to prove a number of invariants for Algorithm 8.5.2 by 
simultaneous induction on the number k of iterations of the while-loop. 

1. V/cGiNVnG N"®. {fix-pre[n], fix-post[n]) G {pre'^[n], post’^[n]) 

2. V/cGiNVnG N"®. pre^[n] □ (!{ stk \ (n, stk) G workset^ } = 

{ newstack{cs) if n = Si 

H{ |mc(/ff(n)) }*{pre'^[m]) \ m G caller {fg{n))} if nG {sq, Sa, . . . , s^} 
ri{ post^[m] I m G predjg(„)(n) } otherwise 

3. V/cGiNVnG N"®. post^[n] = | n'\{pre^[n]) 

4. V (to, stk) G workset^ . fix-pre[m] G stk 

For A: = 0, the investigation of the initialization part of Algorithm 8.5.2 yields: 

(a) VuGN'^. {pre^[n\,posC[n\) = {newstack{fY),\n\{newstack{fY))) send 
(b) worksev = 

{ (si, newstack(cs)) } U 

{ (n, stk) I n G succfg(^rn)(jn') S. stk = posC[m] C newstack(T) } U 
{ (n, stk) I TO G Nf A t G callee{m) A n = start{i) A 

stk = \ mc{i) ] {pre^[m\) C newstack(T) } 

Thus, invariant (1) is an immediate consequence of (a) and Equation System 
8.3.3; invariant (2) follows from (a) and (6); invariant (3) holds because 
of (a); invariant (4), finally, because of (6), Equation System 8.3.3, and the 
choice of fix-pre. 



In order to prove the induction step let A;>0 and let (to, stk) be the element 
currently chosen from the workset. Obviously, we have 



(c) pre^[n] = 



^[n]r\stk if n = m A pre^ ^[nJflsAA: 

pre^~^\n] otherwise 



(d) post^[n] 



|n](pre^[n]) if n = m A pre^ ^[nJrisAA: 

post^~^\n] otherwise 
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Moreover, the induction hypothesis yields: 

(z) Vn G N"®. {fix-pre[n],fix-post[n]) C {pre^~^[n], post^~^[n]) 

(zz) fix-pre[m] C stk and 

(zzz) Vn G N"®. post’^~^[n] = |n](pre^“^[zz]) 

Together this proves immediately the induction step for the invariants (1) 
and (3). 

For the proof of the induction step of the remaining two invariants (2) 
and (4), we can assume 

{pre'^[rn], post’^[m]) = [to] □ stfc, | to ](pre^“^ [m] □ st/c)) 

C {pre’^~^[m],post^~^[m\) 

because otherwise they would simply hold by induction hypothesis. Starting 
with invariant (2), we obtain the induction step in case of n = m as an 
immediate consequence of (c) and the induction hypothesis. Otherwise, if 
n G succfg(^^){m), or if to G Nf and n= start {i) for some z G callee{m), 
invariant (2) follows from the induction hypothesis and the update of the 
workset during the k — th iteration of the while-loop. For the remaining 
nodes invariant (2) is a trivial consequence of the induction hypothesis. 

We are thus left with proving the induction step for invariant (4). Here 
we have to show 

(*) Vn G succfg(jyi){m) . fix-pre[n] C post^\m] 

and, in case of to G Nf , additionally 

(**) Vz G callee{m). fix-pre\start{L)] C |toc(z) ]*(pre^[TO-]) 

According to the choice of fix-pre and Equation System 8.3.3, we have 

Vn G succfg(^rn){fn) ■ fix-pre[n] C fix-post[m] 

and in case of to G Nf , additionally 

Vz G callee{m). fix-pre[start{L)] C |mc(z) }* {fix-pre[m]) 

Thus, (*) and (**) follow immediately from invariant (1) and the s- 
monotonicity of all semantic functions. This completes the proof of the re- 
maining invariant (4). 

The second part of Theorem 8.5.2 is now a consequence of Proposition 
8. 5. 2(2), the previously proved invariants (1) and (2), and equation 

n {pre^[zz] I fc > 0} = n { c I A:>0 A (n, c) G workset^ } 

which holds due to the commutativity and associativity of n. □ 
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As a corollary of Theorem 8.5.2 we obtain: 

Corollary 8.5.2 (Algorithm 8.5.2). 

If the semantic functions |n], n € N"®, and |mc(t)]*, m G Nf and 
i G callee{m), are s-monotonic, we have: 

1. Algorithm 8.5.2 terminates, if C satisfies the descending chain condition. 

2. After the termination of Algorithm 8.5.2 holds: 

Vn G N'^. IMFP(j |*,cs)(’^) = {pre[n], post[n]) 



8.5.3 An Efficient Variant for Computing the JMFP-Solution 

In this section we present an alternative algorithm for computing the IMFP- 
solution, which is more efficient than Algorithm 8.5.2 because only the start 
nodes of the procedures of a program take part in the fixed point itera- 
tion. After the fixed point is reached, the data flow information of each of 
the remaining nodes n € N'^\{ start{G) | G G S' } is computed in a single 
step by applying the semantic function | n | to the data flow information 
pre[start{fg{n))] valid at the entry of the start node of the procedure con- 
taining n. Thus, the global chain lengths, which determine the worst-case 
time complexities of Algorithm 8.5.3 and Algorithm 8.5.2, can be estimated 
by 0(p*l-|-n) and 0(n* 1), respectively, where p and n denote the number 
of procedures and statements in a program, and 1 the length of a maximal 
chain in C. 

Algorithm 8.5.3 (Computing the /MEP-Solution Efficiently). 

Input: A flow graph system S = (Gq, Gi, . . . , Gfc), a complete semi-lattice 
C, and the semantic functionals | |=d/ gtr and | ]=d/ Itr with respect 
to C (computed by Algorithm 8.5.1), a local semantic functional | : 

N* -^(C^C), and a return functional TZ. Additionally, for every node 
TO G Nf the functions |TOc(i)]^ t G callee{m), and a start information 
Cs G C . 

Output: An annotation of S with data flow informations, i.e., an annotation 
with pre-informations (stored in pre) and post-informations (stored in post) 
of one-entry stacks which characterize valid data flow information at the entry 
and at the exit of every node, respectively. 

Remark: The lattice C, and the semantic functions \mc{i) induce the 
semantic functions |TOc(t)]*. newstackfT) denotes the “universal” data 
flow information, which is assumed to “contain” every data flow information. 
The variable workset controls the iterative process, and the variable A is 
a temporary storing the most recent approximation. Recall that Si denotes 
the start node of the main procedure. 
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( Initialization of the annotation arrays pre and post, and the variable workset) 
FORALL s G {si I i G {0, . . . , A}} DO 

IF s = si THEN pre[s] := newstack{cs) ELSE pre[s] := newstack{T) 

FI OD; 

workset := {s^ | i G {0, 2, . . . , fc}}; 



( Iterative fixed point computation) 

WHILE workset yf 0 DO 
LET s G workset 

BEGIN 

workset := workset\{s}; 

A := pre[s] □ 

n{Inc(/5(s))lo|n|(pre[sfart(/5(n))]) | nG caller {fg {&))}■, 
IF pre [s] □ A 
THEN 

pre[s] := A] 
workset := workset U 

{start{callee{n))\n G Nf. fg{n)=fg{s)} 

FI 

END 

OD; 

( Postprocess ) 

FORALL n G N®\{s, | i G {0, . . . , fc}} DO 
pre[n]:= fnj{pre[start{fg{n))]) OD; 

FORALL n G N'^ DO post[n]:= lnl{pre[n]) OD. 



Denoting the values of workset, pre [n], and post[n] after the fc-th execution 
of the while-loop by workset'^, pre^[n], and post^[n], respectively, we obtain 
similar to Proposition 8.5.2, Theorem 8.5.2, and Corollary 8.5.2: 

Proposition 8.5.3. If the semantic functions |n] and |n|, n G N"®, 
and \mc{d) ]*, rn G Nf and i G callee{m), are s-monotonic, we have: 

1. Vn G N'^ y k G IN. pop{pre^[n\) = emptystack A 

pop{post^\n]) = emptystack 

2. ynG N'^ y k G IN. {pre’^~^^[n],post^~^^[n\) Q {pre'^[n\, post^[n\) 

Theorem 8.5.3 (Algorithm 8.5.3). 

If the semantic functions |n] and |n|, n G N'^, and |mc(t)]*, m G Nf 
and i G callee{m), are s-monotonic, we have: 

1. If C satisfies the descending chain condition, there is a ko G IN with 

Vfc > /co Vn G N'^. {pre’^[n],post’^[n]) = {pre^° [n], post’^°[n]) 

2. Vn G N'^./MFP(| |*,ca)(’t) = (ri{pre^[n] | fc > 0}, ri{post^[n] | k > 0}) 
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Corollary 8.5.3 (Algorithm 8.5.3). 

If the semantic functions |n] and |n|, n G N'^, and |mc(t)]*, m G Nf 
and i G callee{m), are s-monotonic, we have: 

1. Algorithm 8.5.3 terminates, if C satisfies the descending chain condition. 

2. After the termination of Algorithm 8.5.3 holds: 

Vn G N'^. IMFP(j Y ,cs){'n) = {pre[n], post[n]) 



8.6 Formal Specification of IDFA- Algorithms 

Summarizing the presentation of the previous sections, an IDFA A is specified 
by a quadruple consisting of a lattice C, a local semantic functional | ]^, 
a return functional TZ, and a start information Cg. In comparison to the 
intraprocedural setting only the return functional is new. Moreover, it is 
important that the specification of an IDFA A can directly be fed into the 
generic Algorithms 8.5.1 and 8.5.2, which yields the pair of IDFA-algorithms 
for the preprocess and the main process of computing the /MFP-solution 
induced by A. 

Definition 8.6.1 (Specification of an IDFA- Algorithm). 

The specification of an IDFA A is a quadruple (C, | ]^,7?., Cg), where 

1. C = (C, n, C, _L, T) is a complete semi-lattice, 

2. I ]^ : iV* ^{C^C) a local semantic functional, 

3. TZ : Nf ^ (C X C ^C) a return functional, and 
4-. Cs G C a start information. 

The pair of IDFA-algorithms induced by A results from instantiating the 
generic Algorithms 8.5.1 and 8.5.2 with (C, | ]^7^) and (C, Itr, | ]^|at*), re- 
spectively, where Ur results from Algorithm 8.5.1. The induced pair of IDFA- 
algorithms is denoted by Alg{A)=df {Algi(A), Alg 2 (A)). The IMOP-solution 
of A and the IMFP -solution of Alg{A) are the specifying and the algorith- 
mic solution of A, respectively.^'^ 

As in the intraprocedural case the gap between an IDFA A, which expresses 
the information of interest in terms of lattice elements, and a program prop- 
erty ip, which is a Boolean predicate, is closed by means of an interpretation 
function, which interprets the lattice elements in the set of Boolean truth 
values. This leads to the interprocedural versions of (^-correctness and p- 
precision. 

Definition 8.6.2 ((^-Correctness and (^-Precision of an IDFA). 

Let p be a program property, A={C,\ ,lZ,Cs) an IDFA, and Int:C^B 
an interpretation of C in B. Then A is 

Recall that C, | ]^ and TZ induce the local semantic functional [ ]*. 
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1. (/^-correct if and only if (i) Int o N -IMOP (j y 

(a) Into X-IMOP(jy , ci,) X-'P 

2. (/3-precise if and only if (i) Into N -IMOPij^y 

(a) Int o X -IMOP Y ,ce) X-P 

The notions of (/^-correctness and (/>-precision relate the specifying solution 
of an IDFA to a specific program property (p. In order to close the gap be- 
tween the algorithmic solution of A and p, we introduce the interprocedural 
versions of MQP-correctness and MQP-precision, which like their intrapro- 
cedural counterparts relate the algorithmic solution to the specifying solution 
of A. Additionally, we also introduce the notion of a terminating IDFA. To- 
gether, this allows us to prove the algorithmic solution of an IDFA A to 
be precise (correct) for a property p essentially as in the intraprocedural 
case: the proof reduces to checking the premises of the Interprocedural Co- 
incidence Theorem 8.4.2 (Interprocedural Correctness Theorem 8.4.1) and 
of the termination of A. This is important because the proof of precision 
does not require any knowledge about the generic algorithms computing the 
ZMFP-solution . 

Definition 8.6.3 (/MQP-Correctness, ZMQP-Precision, Termination). 

AnIDFA A={C,ll',n,Cs) is 

1. ZMQP-correct if and only if IMFPp^ l*,cs) E IMOP(i |*,cA 

2. IMQP-precise if and only if ZMFP(I |*,cs) = IMOP(^f F.c^) 

3. terminating, if its induced pair of IDFA-algorithms Alg{A) terminates. 

As in the intraprocedural case, ZMQP-correctness, ZMQP-precision, and ter- 
mination of an IDFA can usually be proved straightforward by a few substeps. 
This is a consequence of Theorem 8.6.1, which follows from Lemma 8.2.3, the 
Interprocedural Correctness Theorem 8.4.1, the Interprocedural Coincidence 
Theorem 8.4.2, Corollary 8.5.1 and Corollary 8.5.2, and gives sufficient con- 
ditions guaranteeing these properties of an IDFA. 

Theorem 8.6.1 (/MQP-Correctness, /MQP-Precision, Termination). 

AnIDFA A={C,ll',IZ,Cs) is 

1. IMOP-correct, if all semantic functions |n]^, nG N* , 
functions IZ{n), n G Nf , are monotonic. 

2. IMOP-precise, if all semantic functions |n]^, nG N* , 
functions IZ{n), n G Nf , are distributive. 

3. terminating, if (i) [C^C] satisfies the descending chain condition, 

(ii) all semantic functions |n]^, nG N* , are mono- 
tonic, 

(Hi) all return functions IZ{n), n G Nf , are monotonic. 



and all return 
and all return 
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Note that for C itself the descending chain condition, though belonging to 
the premises of Corollary 8.5.2, need not explicitly be checked because of: 

Lemma 8.6.1. If [C^C] satisfies the descending chain condition, then C 
satisfies the descending chain condition as well. 

Note that the converse implication of Lemma 8.6.1 is in general invalid. For 
convenience we introduce as in the intraprocedural setting a notion which 
expresses both the termination of an IDFA and its (^-correctness ((^-precision) 
for a given program property (p. 

Definition 8.6.4 (Correctness and Precision of an IDFA). 

Let (fi be a program property and A an IDFA. Then A is called correct 
(precise) for ip if and only if A is (i) terminating and (ii) p-correct {p- 
precise). 



8.7 Forward, Backward, and Bidirectional 
IDFA- Algorithms 

Like their intraprocedural counterparts, also interprocedural DFA-algorithms 
can be grouped into forward, backward, and bidirectional analyses according 
to the direction of information flow (cf. Section 2.2.7). In this chapter we 
developed our framework for interprocedural abstract interpretation for for- 
ward analyses. Backward analyses, however, can be dealt with by forward 
analyses like in the intraprocedural counterpart of our framework essentially 
after inverting the flow of control. Central is to identify the set of nodes 
N"® of S with the set of nodes N* \Nf of G* , and to identify every node 
n G with the set of nodes { nc(t) | i G callee{n) }. As a consequence of 
inverting the flow of control, the return functions must be associated with call 
nodes instead of return nodes. Bidirectional problems are in contrast to for- 
ward and backward problems more difficult to handle because of the lack of 
a natural operational (or IMOP-) interpretation. As in the intraprocedural 
case this problem can elegantly be overcome by decomposing bidirectional 
analyses into sequences of unidirectional ones. In Chapter 10 we illustrate 
this by means of the interprocedural extensions of the computationally and 
lifetime optimal algorithms for busy and lazy code motion of Chapter 3. The 
interprocedural algorithm for busy code motion decomposes the originally 
bidirectional flow of data flow information (cf. [Mo, MR2]) into a sequence of 
a backward analysis followed by a forward analysis. Though computationally 
optimal results are in general impossible in the interprocedural setting, we 
show that it generates interprocedurally computationally optimal programs if 
it is canonic. Moreover, like its intraprocedural counterpart the algorithm can 
then be extended to achieve interprocedurally computationally and lifetime 
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optimal results. This requires only two further unidirectional analyses. The 
resulting algorithm for lazy interprocedural code motion is the first algorithm 
which meets these optimality criteria. 




9. A Cookbook for Optimal Interprocedural 
Program Optimization 



In this chapter we present the interprocedural counterpart of the intrapro- 
cedural cookbook for program optimization. To this end we summarize the 
presentation of the Chapters 5, 6, 7, and 8 for constructing an interprocedural 
program optimization from the designer’s point of view. As in the intrapro- 
cedural setting, the point is to provide the designer of a program optimiza- 
tion with concise guidelines which structure and simplify the construction 
process, and simultaneously hide all details of the framework which are irrel- 
evant for its application. Following the guidelines the construction and the 
corresponding optimality and precision proofs of the transformations and the 
IDFA-algorithms, respectively, can be done like in the intraprocedural setting 
in a cookbook style. 



9.1 Optimal Interprocedural Program Optimization 

9.1.1 Fixing the Program Transformations and the Optimality 
Criterion 

According to the interprocedural version of our two-step approach to optimal 
program optimization (cf. Section 7.3.1), we first have to fix the class of pro- 
gram transformations and the optimality criterion of interest. This requires: 

Define . . . 

1. a set of appropriate program properties 

2. the class of program transformations T of interest in terms of a 
subset C (p 

3. a relation <t C Sq- x S'-r/ which induces the optimality criterion 
of interest 



The optimality criterion induced by <t is the criterion of C<.j- -optimality 
in the sense of Definition 7.3.1, i.e.: 

A transformation Tr G T is 0<,p-optimal, if for all Tr' gT holds: 

Sxr <T Stt' 

^ In general, <r will be a pre-order. 
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9.1.2 Fixing the Optimal Program Transformation 

Next, the (optimal) program transformation of interest must be defined. Sim- 
ilar to the class of program transformations T, it is defined in terms of a 
subset of the properties of <P, i.e.: 



Define . . . 

4. the program transformation Tropt of interest in terms of a subset 



Like in the intraprocedural case, we have to prove that the transformation 
Tropt is a member of the transformation class under consideration and satis- 
fies the optimality criterion of interest. Thus, we have to prove the following 
two steps: 



Prove . . . 

5. Tropt G T 

6. Tropt is 0<^-optimal 



9.2 Precise Interprocedural Data Flow Analysis 

After proving the optimality of Tropt, we must define for every property 
ip C <Pt involved in the definition of Tropt an ID FA A(p, whose induced 
IDFA-algorithms compute the set of program points enjoying (p. Without 
loss of generality, we thus consider an arbitrary, but fixed property (p of 
in the following. 

9.2.1 Specifying the IDFA 

According to Section 8.6 the specification of the IDFA A^p, and the proof of 
its (^-precision requires the following components: 



Specify . . . 




7. a complete semi-lattice (C, n, C, _L, T) 




8. a local semantic functional | : N* - 


^C^C) 


9. a return functional TZ : N* ^ [C x C-. 


■C) 


10. a start information Cs € C 




11. an interpretation Int : C ^ B 
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As in the intraprocedural case, the lattice C represents the data flow informa- 
tion of interest, the local semantic functional gives meaning to the elementary 
statements of the argument program, and the start information Cg represents 
the data flow information which is assumed to be valid immediately before 
the execution of the argument program starts. New is the return functional, 
which is the handle to properly deal with local variables and value parame- 
ters of recursive procedures. The function /nt. Anally, interprets like in the 
intraprocedural case the elements of C as Boolean truth values, and there- 
fore, closes the gap between the data flow information computed and the 
program property (f> of interest. 

Handling Backward Analyses. We recall that in our framework backward 
analyses can be dealt with by means of forward analyses after inverting the 
flow of control (cf. Section 8.7). The role of call nodes and return nodes is then 
interchanged. Hence, for backward analyses the functionality of the return 
functional is TZ : N* ^ (C x C ^ C). This is demonstrated in the applications 
of Section 11.1 and Section 11.4. 

9.2.2 Proving Precision of 

Next, we have to show that A^p is precise for ip in the sense of Definition 
8.6.2. According to Theorem 8.6.1 the following proof steps are sufficient: 



Prove . . . 

12. the function lattice [C ^C] satisfies the descending chain condition 

13. the local semantic functions |n]\ n G N* , are distributive 

14. the return functions TZ{n), n G N*, are distributive 

15. the specifying solution of Ap is (/^-precise, i.e.: 

(i) Into A-IMQP(| |yc,) N-p 

(ii) Into X-IMOP(ij*^cp X-p 



Combining Definition 8.6.2, Theorem 8.6.1, and and the propositions of the 
steps 12, 13, 14, and 15 we obtain the desired precision result: 

Theorem 9.2.1 (A,^-Precision). 

Aip is precise for p, i.e., A^ is terminating and p-precise. 

After proving Theorem 9.2.1 for all IDFAs Ap, p G <Pt, we obtain that the 
transformation Tropt and the transformation Tr { a.^ \ induced by the 

algorithmic solutions of the IDFA-algorithms A^,, p G <Pt, coincide. Hence, 
we have the desired optimality result: 

Theorem 9.2.2 (O- Optimality). 

The transformation Tr \ is 0<.p- optimal. 
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Monotonic IDFA-Problems: Additional Proof Obligations. For a 

monotonic IDFA-problem, i.e., an ID FA, whose local semantic and return 
functions are monotonic (but not distributive), the Interprocedural Correct- 
ness Theorem 8.4.1 still guarantees ZMQP-correctness of A^p. Combining this 
with the proposition of step 15, this even implies (y9-correctness of Ap. In gen- 
eral, however, this is not sufficient in order to guarantee that the program 
I resulting from the transformation is correct or 

even 0<„--optimal. Similarly, this holds if in step 15 only (^-correctness of 
the specifying solution of Ap could be proved instead of (^-precision. In both 
cases the following two proof obligations must additionally be verified in or- 
der to guarantee correctness and profitability of the induced transformation: 



Prove . . . 

16. &T 

17 - <T S 



Application: Interprocedural Code Motion. Similar to Section 2.3, we 
conclude this chapter with an outlook to the application considered in Chap- 
ter 10, interprocedural code motion. Like for its intraprocedural counterpart, 
the set of program properties <P required is basically given by the interproce- 
dural predicates for safety, correctness, down-safety, earliestness, latestness, 
and isolation. The class of program transformations T is given by the set of 
interprocedurally admissible code motion transformations, which are defined 
in terms of the interprocedural predicates for safety and correctness. Under 
a natural side-condition, the interprocedural predicates for down-safety and 
earliestness specify the computationally optimal transformation of interpro- 
cedural busy code motion, and the interprocedural predicates for latestness 
and isolation the computationally and lifetime optimal transformation of in- 
terprocedural lazy code motion. 





10. Optimal Interprocedural Code Motion: 
The Transformations 



In this chapter we illustrate our two-step approach for optimal interproce- 
dural program optimization considering interprocedural code motion as ap- 
plication. At first sight, the very same strategies as in the intraprocedural 
setting seem to apply in order to avoid unnecessary recomputations of val- 
ues. However, as we are going to show there is a fundamental difference to 
the intraprocedural setting: computationally optimal results are in general 
impossible. Consequently, every code motion transformation must fail for 
some programs to yield computationally optimal results. Intraprocedurally 
successful strategies, however, can even exhibit severe anomalies interproce- 
durally. This applies to the interprocedural counterparts of busy and lazy code 
motion, too. In the interprocedural setting, the strategies of placing compu- 
tations as early as possible or as late as possible, which are the guarantors of 
computationally and lifetime optimal results in the intraprocedural setting, 
can fail to be strict in the sense of [CLZ], and thus fail to guarantee even 
profitability. In essence, this is caused by the failure of an intraprocedural de- 
composition theorem for safety (cf. Safety Lemma 3.2.1). As a consequence, 
the conjunction of down-safety and earliestness does not imply profitability 
of an insertion. Insertions at down-safe earliest program points can even fail 
to cover any original computation. 

Revealing these differences and demonstrating their impact onto interpro- 
cedural code motion is a major contribution of this chapter. Going beyond, 
we additionally propose a natural constraint, which is sufficient in order to 
guarantee the computational and lifetime optimality of the interprocedural 
counterparts of busy and lazy code motion, denoted as IBCM- and ILCM- 
transformation, for a large class of programs. 

The constraint, we propose, is canonicity (cf. Definition 3.2.3): when- 
ever the /RGM-transformation is canonic for a program, its result is com- 
putationally optimal, and the result of the /L CM -transformation is lifetime 
optimal. Intuitively, this constraint means that every insertion of the IBCM- 
transformation is required, i.e., it is used on every program continuation 
without a preceding modification or insertion of the computation under con- 
sideration. This is quite a natural requirement for an optimization based on 
code motion. In fact, in the intraprocedural setting, it is even necessary for 
computational optimality (cf. Theorem 3.2.1). 
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Moreover, the introduction of this constraint allows us to capture the 
primary goal of this chapter: illustrating the two-step approach of our frame- 
work for optimal interprocedural program optimization, and showing how to 
apply the cookbook of Chapter 9. In fact, one should note that the problems 
encountered in interprocedural code motion are not caused by the underly- 
ing data flow analyses. These can rather straightforward be deduced from 
their intraprocedural counterparts in order to compute the interprocedural 
versions of the properties their intraprocedural counterparts are designed for. 
The subtleties encountered in interprocedural code motion are caused by the 
optimization itself. 

In Section 10. 1 we investigate and illustrate the essential differences be- 
tween the intraprocedural and interprocedural setting concerning code mo- 
tion. Subsequently, we informally propose canonicity as a natural sufficient 
constraint for the computational and liftetime optimality of the IBCM- and 
/LCM-transformation. Afterwards, we introduce in Sections 10.2 and 10.3 the 
basic definitions required for defining interprocedural code motion transfor- 
mations together with some technical lemmas simplifying the reasoning about 
their properties. Central is then Section 10.4, in which we present the specifi- 
cation of the ZBCM-transformation together with its proof of interprocedural 
computational optimality under the premise of its canonicity for the program 
under consideration. Under the same premise, we subsequently present in 
Section 10.5 the specification of the /LCM-transformation together with its 
proof of interprocedural lifetime optimality. It turns out that both optimality 
results can be proved by means of the same techniques as in the intrapro- 
cedural case (cf. [KRS2]). Only the treatment of local variables and formal 
parameters of recursive procedures requires some technical refinements. In 
the absence of procedures, the interprocedural transformations of busy and 
lazy code motion reduce to their intraprocedural counterparts of Chapter 3. 



10.1 Essential Differences to the Intraprocednral Setting 

10.1.1 Computational Optimality 

In the intraprocedural setting, every program has a computationally optimal 
counterpart. Moreover, canonicity is necessary for computational optimality, 
i.e., computationally optimal transformations are always canonic. Both facts 
do not carry over to the interprocedural setting as illustrated by the examples 
of Figure 10.1 and Figure 10.5. 

In the example of Figure 10.1, the computations of a -I- 6 at the nodes 12 
and 24 are partially redundant with respect to the computations of a -I- 6 at 
the nodes 7 and 23. 

Investigating this example in more detail reveals that there are only two 
admissible code motion transformations eliminating some of these redundan- 
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ir ^ : a, b, X, y, z n 2 ^3 ^ 




Fig. 10.1. The original program II\ 

cies and generating a “significantly” different and computationally better 
program than TTi. Their results are shown in Figure 10.2 and Figure 10.3. 



71 . : a, b, X, y, z 



n 



3 



jt 



4 




Fig. 10.2. I7i: computationally minimal, but not computationally optimal 
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n j ; a, b, X, y, z it 2 ^3 ^4 




Fig. 10.3. ill : computationally minimal, but not computationally optimal 



Note that II ^ and 11^ are both computationally better than iTi, but 
are themselves incomparable with respect to this relation; a fact, which can 
easily be checked by means of the subsequent table summarizing the number 
of computations executed in 77, n^, and II ^ , respectively. 



Program 


paths via 7 T 2 


paths via 714 


“left” path 


“right” path 


“left” path 


“right” path 


ill 


2 


1 


3 


2 


n'l 


1 


1 


2 


2 


n'i 


2 


1 


2 


1 



Calling procedure tts via tt 2 in 11^, there is a single computation of a + 6 
on every program path; calling it via 7 T 4 , there are two computations of a + 6 
on each path. This contrasts with IIi , where independently of calling tts via 
7 T 2 or 7 T 4 , there is a path containing two computations of a + 5 and another 
one containing only one. Hence, both programs are computationally better 
than III. However, as one can easily see, there is no computationally optimal 
counterpart of TTi. 

Consider next the program of Figure 10.5. It shows the result of an ad- 
missible code motion transformation applied to the program of Figure 10.4. 
Note that the program of Figure 10.5 is computationally optimal. However, 
it is not canonic because the insertion at node 8 is not used along program 
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continuations passing nodes 9 and 22. In fact, computationally optimal re- 
sults can only be achieved for this example if one is prepared for dropping the 
request for canonicity. The result of any canonic code motion transformation 
for the program of Figure 10.4 is computationally worse than the program of 
Figure 10.5. 



71 j : a, b, X, y 
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Fig. 10.4. The original program II 2 
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Jl j : a, b, X, y, h 
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Fig. 10.5. Transf. program 772: computationally optimal, but not canonic 



As a consequence of these examples we have: 

Theorem 10.1.1 (Computational Optimality). 

In the interprocedural setting, 

1. computational optimality is in general impossible, 

2. canonicity is not necessary for computational optimality. 
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10.1.2 Safety 

In the intraprocedural setting, safety can be decomposed into up-safety and 
down-safety: a program point, which is safe, is always up-safe or down-safe. In 
the interprocedural setting, safety cannot be decomposed this way in general. 
Program points can be safe, though they are neither up-safe nor down-safe. 
This is demonstrated by the program of Figure 10.6. All nodes of procedure 
7T2 are safe, but none of them is up-safe or down-safe with respect to the 
computation of a -I- 6. 



71 



: a, b, X 



K 



2 



2 



3 





Up-Safe 

Down-Safe 



Up-Safe 

Down-Safe 



Fig. 10.6. Safe though neither up-safe nor down-safe 



Figure 10.7 shows the interprocedural flow graph of the program of Figure 
10.6, and illustrates the difference to the intraprocedural setting in more 
detail. The point of this example is that all interprocedurally valid paths 
passing a node of 7T2 satisfy that a + b has been computed before entering 
7T2 or that it will be computed after leaving it. Together, this implies safety 
of all program points of 7T2. In fact, all paths of the interprocedural flow 
graph lacking a computation of a + b are interprocedurally invalid because 
they do not respect the call/return-behaviour of procedure calls, and thus do 
not represent legal program paths as illustrated by the highlighted paths in 
Figure 10.7. Intraprocedurally, i.e., considering the graph of Figure 10.7 an 
intraprocedural flow graph, the highlighted paths would be valid excluding 
safety for any of the nodes 7 to 12. Thus, in contrast to the intraprocedural 
setting we have: 

Theorem 10.1.2 (Safety). 

In the interprocedural setting, the disjunction of up-safety and down-safety 
is not necessary for safety. 
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Fig. 10.7. Interprocedural flow graph of the program of Figure 10.6 



10.1.3 Down-Safety 

In the intraprocedural setting, down-safe program points are always candi- 
dates of a “canonic” insertion. This means, whenever a program point n is 
down-safe, there is an admissible code motion transformation inserting at n, 
and the value computed and stored at n reaches on every program continu- 
ation a use site without an intervening modification of any of the operands 
of the computation under consideration or an insertion of the same value. 
Moreover, every node lying on a path from a down-safe node to a node con- 
taining an original computation without an intervening modification of one 
of its operands is down-safe, too. Both facts do not carry over to the inter- 
procedural setting as illustrated by the example of Figure 10.8. 

Note that the computation of a -I- 6 at node 10 is down-safe. Nonetheless, 
there is no admissible code motion transformation inserting a computation 
at node 10, such that this value can be used on some program continuation 
for replacing an original computation. In essence, this is a consequence of the 
interplay of the call site of 7T2 at node 4, and the assignment to a at node 25 
modifying the value of a -I- 6. Together this prevents the insertion of a -I- 6 at 
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j : a, b, X, y, h 2 




Not Down-Safe! 



Not Down-Safe! 



node 26. Moreover, note that every program path from node 10 to node 23 
or node 19, passes node 21 or 32, respectively, containing a computation of 
a + b. Nonetheless, neither node 21 nor node 32 are down-safe. 
Summarizing, we therefore have: 

Theorem 10.1.3 (Down- Safety). 

In the interprocedural setting, 

1. down-safety of a program point n does not imply the existence of a canonic 
code motion transformation inserting at n, such that the temporary ini- 
tialized at n reaches on every program continuation a use site, 

2. nodes lying on a path from a down-safe node to a node containing an 
original computation are not necessarily down-safe, even if there is no 
intervening modification of any of its operands. 
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Figure 10.9 illustrates the impact of Theorem 10.1.3. It shows a program, 
which is only slightly different from the program of Figure 10.8. Though 
node 8 is down-safe (and even earliest), a temporary initialized at node 8 
cannot contribute to the elimination of any (partially) redundant computa- 
tion. This is an important difference to the intraprocedural setting, where 
down-safe earliest program points are the insertion points of the computa- 
tionally optimal BCM-transformation. 




Fig. 10.9. Down-safe and earliest, but unusable 
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10.1.4 Canonicity: The Key to Computational and Lifetime 
Optimality 

Theorem 10.1.1 excludes computational optimality in general. The failure 
of the decomposability of safety into up-safety and down-safety, the back- 
bone of computationally and lifetime optimal intraprocedural code motion, 
turns out to be the source of placing anomalies showing up when adapting 
intraprocedural placing strategies interprocedurally. This applies to the as- 
early-as-possible and as-late-as-possible placing strategies underlying busy 
and lazy code motion, too. 

However, as we are going to show, under the premise of canonicity the 
interprocedural counterpart of busy code motion is computationally optimal, 
and the interprocedural counterpart of lazy code motion is computationally 
and lifetime optimal. Moreover, as in the intraprocedural setting, the latter 
transformation is unique in satisfying both optimality criteria. Canonicity is 
a natural constraint as it requires that insertions are used on every program 
continuation. It can easily be checked, and characterizes for a large class 
of interprocedural programs a situation where there is no difference to the 
intraprocedural setting. 

This is illustrated by the example of Figure 10.10. It shows a program, 
for which the ZBCM-transformation is canonic as shown in Figure 10.11, 
where for convenience the set of interprocedurally down-safe and earliest 
program points are highlighted. Note that the program of Figure 10.11 is 
indeed computationally optimal. 



10.2 Preliminaries 

After the introductory discussion of the previous section on interprocedu- 
ral code motion, we now present the basic definitions required for a for- 
mal treatment. As in the intraprocedural setting we develop the IBCM- 
and /LCM-transformation with respect to an arbitrary, but fixed pair of 
a program 77 S Prog and a program term t G T allowing a simpler and 
unparameterized notation. We denote the flow graph system and the inter- 
procedural flow graph representing 77 by S' and G*, respectively. Without 
loss of generality, we can dispense with reference parameters, which according 
to Chapter 6 can be considered parameterless formal procedure parameters. 
This allows us to compute alias-information for reference parameters which 
can subsequently be used in a block-box fashion along the lines of [KRS4] 
for code motion (cf. Section 12.2.2). In the following we thus assume that 
77 is free of reference parameters. If 77 contains formal procedure calls and 
satisfies the sfmr-property, we assume that it was analyzed by means of the 
HO-DFA of Chapter 6.^ We thus assume that the function callee yields for 

^ Recall that the sfmr-property is decidable. In particular, it holds for all programs 
being free of formal procedure calls, or of statically nested procedures. 




158 10. Interprocedural Code Motion 



7C j ; a, b, X, y, z 2 ^3 ^4 




Fig. 10.10. The original program TTs 



7t j : a, b, X, y, z 



7t 
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71 
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Fig. 10.11. Canonic and hence comp, optimal result of the /BCM-transf. 



every (formal) procedure call the set of procedures it can potentially call. If 
the sfmr-property is violated, formal procedure calls are treated like exter- 
nal procedure calls. Without loss of generality, we finally assume that every 
formal procedure call node n has a unique predecessor and successor. If nec- 
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essary, this situation can easily be achieved by appropriately inserting a new 
predecessor p and successor s of n as illustrated in Figure 10.12. 




Fig. 10.12. Unique predecessors (successors) of formal procedure call nodes 

As in the intraprocedural case, the process of code motion can be blocked 
in completely arbitrary graph structures. The flow graph system S must 
therefore slightly be modified in order to avoid any blocking (cf. Section 
3.1). In a first step, every formal procedure call node n with | callee{n) \ > 
2 is replaced by a set of nodes M=df {n{-K) | tt G callee{n)} representing 
pairwise disjoint ordinary procedure calls of a procedure tt G callee{n), and 
having all the same predecessor and successor as n. After this modification, 
every procedure call node to in S' represents an ordinary procedure call, and 
satisfies | callee{m) | = 1 as illustrated in Figure 10.13. 




n(nj) 




Fig. 10.13. “Unfolding” formal procedure calls 
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In a second step, all edges in S (and hence also in G* except for edges 
starting in nodes of N*) leading to a node with more than one incoming 
edge are split by inserting a synthetic node. Figure 10.14 illustrates this 
transformation for the program fragment of Figure 10.13. 



M : 



M: n(jt^) 




Fig. 10.14. Splitting of edges 



As in the intraprocedural case, the splitting of edges and the replacing 
of formal procedure calls by their associated sets of ordinary procedure calls 
excludes any blocking of the process of code motion. In addition, it simplifies 
this process because computations can uniformly be moved to node entries 
(cf. Section 3.1). On the other hand, it enlarges the argument program. Thus, 
it is worth to be noted that formal procedure calls are reestablished after the 
code motion transformation. Analogously, synthetic nodes, which are not 
used for an insertion of the code motion transformation under consideration, 
are removed afterwards, too. 



10.2.1 Basic Definitions 

Global and Local Variables, and the Polymorphic Function Var. 

For every procedure tt G U, let LocVar{Tr) denote the set of local variables 
and value parameters of tt: 

Vtt G 77. LocFar(7r)=d/ I lv,vp(^) 

Analogously, let GlobVar{Tr) denote the set of variables which are global for 
tt: 

Vtt G 77. GlobVar{'K)=df Ext\^{II) U ^p{StatPred~^ (tt)) 

In addition, let Var be an abbreviation of Ilv.vp in the following. 
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Range of Terms. Given a term t' , its range is the set of nodes, where 
all variables of Var(t') are bound according to the binding rules of Prog. 
Intuitively, the range of t' is the set of all program points where a placement 
of t' can safely be placed with respect to the static semantics of Prog. For 
example, in the program of Figure 7.1 the range of the terms a+b and Zi 3 + e 
are the nodes 9 , . . . , 16 and 17 , . . . , 23 , respectively. We define: 

yt' G T. Range(t')=df {n G N* \ Vv G Var(t'). decl{v) G StatPred* {fg (n)) } 



The function fg is here extended to nodes of N* and N* according to the 
following convention: 



Vn G {N: U K). fg{n)=df 



fg{ns) if n G N* 

fg{pred* (n)) if n G N* 



We recall that ns denotes the procedure call node of S corresponding to n 
(cf. Section 7.2.2). 

Incarnations of Variables and Terms. Interprocedurally, there are po- 
tentially infinitely many incarnations of local variables of recursive proce- 
dures. In order to formally deal with this phenomenon, we define for every 
occurrence of a node m on a path p G IP[s*,n], n G N* , two functions 
RhsLeVp and LhsLeVp, which map every variable to a natural number giv- 
ing the nesting level of its right-hand side incarnation and its left-hand side 
incarnation^ respectively, when m is reached on p. Right-hand side and left- 
hand side here refer to the “sides” of an assignment statement. For every 
pGU{IP[s*,n] I n G N* }, we define the function 



RhsLeVp : {!,..., Ap} ^ {Var{II) — > INq) 



by 

Vz G {!,..., Ap} Vu G Var{n). RhsLeVp{i)= df 



' 1 
0 

< RhsLeVpii — 1) -I- 1 
RhsLeVp{i — 1) — 1 
RhsLeVp{i — 1) 



if z = 1 A u G Extvin) U LocVar{TTi) 

if z = l A vG LocVar{n\{'Ki\) 

if Pi = start {deal (v)) 

if Pi G succ*{Nf) A pi-2 = end{decl{v)) 

otherwise 



Intuitively, RhsLeVp{i) = 0 means that currently there is no storage allocated 
for V. By means of RhsLeVp we can now analogously define the function 



LhsLeVp : {!,..., Ap} ^ {Var{II) — > INq) 



by 



Vz G {!,..., Apl Vu G Var{n). LhsLeVp{i)=df 
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J RhsLev'^p{i) + 1 if pi G IV* A succ* {pi) = start{decl{v)) 
y RhsLeVp{i) otherwise 

The notion of an incarnation of a variable carries over to terms: whenever a 
new incarnation of a variable is generated, simultaneously a new incarnation 
of all terms is created containing this variable as an operand. For interpro- 
cedural code motion it is necessary to distinguish the incarnations of terms 
being valid at different points of a program path (see e.g., Definition 10.4.1). 
Considering the term t, which we have fixed as code motion candidate, this 
is conveniently achieved by means of the predicate Sameinc defined below. 
Intuitively, for p G IJ { IP[s*, n] | n G N* }, and I < i < j < \p, the predi- 
cate SameInCp[i, j] indicates, whether the same incarnation of the term t is 
involved at the nodes pi and pj on p. 

SameInCp[i,j] =df RhsLev^°‘^^*\i) = LhsLev^°'’^^^\j) A 

y i < k < j. RhsLeVp‘^^^^\i) < RhsLev^°‘'^^^\k) 
SameInCp[i,j[ =df RhsLeVp°'^^*\i) = RhsLeVp°'^^*\j) A 

y i < k < j. RhsLev^°‘'^^*\i) < RhsLev^°“'^^*\k) 
SameInCp\i,j[ =df LhsLeVp°'^^*\i) = RhsLeVp‘^^^*\j) A 

yi<k<j. RhsLevl^^^*^{i) < RhsLevl^^^*\k) 

We remark that 

RhsLeVp^^^^^^ (i) = LhsLeVp°‘'~^*^ (j) and RhsLeVp'^^^*^ (z) = RhsLeVp‘^^^^^ (j) 
are used as abbreviations of 

y V G Var{t). RhsLeVp{i) = LhsLeVp{j) 

and 

Vw G Var{t). RhsLeVp{i) = RhsLeVp{j) 

respectively. 

Local Predicates Comp* and Transp*, and the Function ModLev. The 
predicates Comp* and Transp* are the counterparts of the intraprocedural 
predicates Comp and Transp . They are the basic properties for the specifi- 
cation of the IBCM- and /LCM-transformation, and their definitions reflect 
the existence of different, potentially even infinitely many incarnations of the 
code motion candidate t in the interprocedural setting. In fact, the incarna- 
tion of the term t, which is computed or modified by a node n, depends on 
the context of the program execution n occurs in. This is in contrast to the 
intraprocedural setting, where only a single incarnation of every term exists. 
As a consequence, one does not need to deal with incarnation information 
intraprocedurally. For every path p G IP[s*,e*] we define: 
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Comp* i^p^^){pj)=df Comp (pj) A SameInCp[i, j[ 

and 

Transp* (^p ^^{pj)=df RhsLeVp \i) LhsLeVp Y 

3z;G Var{t). RhsLeVp{l) < RhsLeVp{i) 

where 

itiLoijCUp J -f— iJilolj(iUp yj j 

is a shorthand for 

Vue VarLhsVar(pj){t)- RhsLeVp{i) ^ LhsLevl(j) 



Intuitively, the predicate Comp* f^p ^-^{pj) holds if and only if t is computed 
in pj (i.e., Comp (pj)), and if the incarnation of t valid in pj on p is 
the same, which is valid at pi (i.e., SameInCp[i, j[). Similarly, the predicate 
Transp* !^p^-^{pj) is true if the variables of t modified by pj concern other 
incarnations of these variables than those, which are valid at pi on p (i.e.. 



RhsLev 



l^'^rLhsVar(p,- ) (^) 



(z) yf LhsLeVp 



l^*^rLhsVar(p,- ) (Z) 



(j) ), or if the incarnation of 



t valid at pi is not valid at pj, and will never become valid again (i.e., 
3z<?<j3u€ Var{t). RhsLeVp{l) < RhsLeVp{i ) ). 

By means of the definitions of Comp* and Transp* , we obtain immedi- 
ately: 



Lemma 10.2.1. 

1. Vp G IP[s*,e*] VI < z < Ap. Comp* (^p,j_){pi) Comp{pi) 

2. Vp € IP[s*, e*j Vl<i<XpVi<j < Ap. Transp (pj) Transp* (^p{’^{pj) 



In addition to the predicates Comp* and Transp* , the specification of the 
IDFA-algorithms involved in the /BCM-transformation requires the function 
ModLev : N* which stands for modified level. Here, A/’=d/IN U {oo} 

denotes the disjoint union of IN and {oo}, which forms together with the 
minimum function as meet-operation the chain-like lattice IVqo of natural 
numbers with least element 0 and greatest element oo: 

lNoo=df (Af, Min, <, 0, oo) 

Intuitively, ModLev(n) yields of all variables of t which are modified by n 
the lowest of the statical levels some of these variables are declared on, i.e., 
ModLevfn) yields the statical level of those variables of t modified by n, 
which are “most globally” declared. 

ModLev : N* Af defined by 

Vn G N* . ModLev{n)=df Min{{ StatLevelfv) \ v G Var rhsVar{n){t) }) 
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where the function StatLevel : V ^ IN is assumed to map its arguments to 
the statical level they are declared on. 

Convention: For a predicate Predicate and an interprocedural path p G 
IP [m, n] of G*, we introduce in analogy to the intraprocedural setting the 
following abbreviations (cf. Section 3.1): 

— Predicate^ {p) 4=^ VI < t < Ap. Predicate{pi) 

— Predicate^ (p) 4 =^ 3 1 < t < Ap. Predicate{pi) 

Note that Predicate^ {p) and ^ Predicate^ {p) are then abbreviations of the 
formulas 31 < z < Ap. ^ Predicate (pi) and VI < z < Ap. ^ Predicate {pi), 
respectively. 

Global Transparency. Based on the predicate Transp* , we additionally 
introduce the predicate GlobTransp, which generalizes the notion of trans- 
parency uniformly to all nodes of the flow graph system S. 

Vzz e N"®. GlobTransp{n)=df 

{ Transp (n) if n G N'^\Nf 

Vp G IP[s*,e*] Vz G callee{n) ^ i,j. 

{Pi = nc{i) Apj=nR{L) Ap]i,j[e ClP[pi+i,pj-i]) 

Transp*Jp^^'^{p[i, j]) otherwise 



10.2.2 First Results 

In this section we present some technical and easily to prove lemmas, 
which are helpful for establishing the optimality of the IBGM- and ILGM- 
transformation . 

The first lemma follows from the fact that all edges in S leading to nodes 
with more than one incoming edge have been split by inserting a synthethic 
node. This holds analogously for the corresponding edges of G* . For the 
remaining edges of G* starting in nodes of N* the lemma is trivial because 
the start node of the called procedure is their unique successor. 

Lemma 10.2.2 (Interprocedural Control Flow). 

1. a) Vn€ N'^. \predfg(n){n) | > 2 ^ szzcc/p(„)(pred/p(„) (n)) = { zz } 

b) \/neN^. I succfg(n) {n)\ > 2 ^ predfgt^n) (succfg(n) (n)) = { zz } 

2. a)Wn£ N* . \pred*{n) \ > 2 ^ succ*{pred*{n)) = {n} 

b)yn€ N* . \succ*{n)\ > 2 pred*(szzcc*(zz)) = { zz } 

Synthetic nodes represent the empty statement “skip” . Thus, as a corollary 
of the Control Flow Lemma 10.2.2 we get: 
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Corollary 10.2.1. 

1. VnGN'^. \predf„(n)(n) \ >2=^ 

predfg(^n){n) C N‘®\Nf A 

2. Vne lV*\{stort(G) |G G S'}. 

\pred*{n) | > 2 pred*{n) C N* \N* 



J} Transp (m) 

mGpredfg(^) (n) 

A }} Transp {m) 

m^pred* (n) 



The next two lemmas follow by means of the definitions of RhsLev and 
LhsLev. 

Lemma 10.2.3. Let p G IP[s*, e*], and t G {1, ..., Apj. We have: 

1. p* G N* \{N* \JN*)^ RhsLev^‘^^^^\{) = RhsLev^°‘^^"\i + 1) 

2. Pi G N* Vn G Var{n). RhsLeVp{i) = 

( RhsLeVp{i + 1) — 1 if n G LocVar{fg{succ*{pi))) 

} RhsLev p{i+l) otherwise 

3. Pi G N* Vn G Var{n). RhsLeVp{i) = 

( RhsLev p{i + 1) + 1 if n G LocVar{fg{pred* {pi))) 

} RhsLev p{i+l) otherwise 



Lemma 10.2.4. Let p G IP[s*, e*], and t G {1, ..., Apj. ITe Lane.- 

1. p, G N*\N* ^ L/isLen^“’'(^)(z) = L/isLen^“’’(^)(z) 

2. Pi G N* Vn G Var(LI). 

„ J LhsLevl{i)-l if v £ LocVar{fg{succ*{p^))) 
^^"Le^pW= \ LLsLen|i(t) otherwise 



The call nodes and return nodes of G* modify the local variables and value 
parameters of the called procedure. Thus, we have: 

Lemma 10.2.5. 

1. Vn G N*. Transp (n) Var(t) n LocVar{fg{succ* {n))) = % 

2. \/n £ N*. Transp\n) Var\t) n LocVar\fg\pred*\n))) = 0 



The following lemma is essentially a consequence of the fact that intervals of 
call nodes and return nodes on interprocedural paths are either disjoint or 
one is included in the other (cf. Lemma 7.2.1). 

Lemma 10.2.6. Let p G IP[s*,e*], and i,j G {l,...,Ap| such that pi £ 
N* and pj £ N* are a pair of matching call and return nodes of p. We 
have: 

1. yi<l<j. 

RhsLev^’^''^^^ (t) = RhsLev^^^^^^ (j + 1) 
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Recalling that t is the fixed program term under consideration, we obtain by 
means of Lemma 10.2.6 and the definition of Comp*: 

Lemma 10.2.7. Let p G IP[s*,e*], and i,j G {l,...,Ap} such that pi G 
N* and pj G N* are a pair of matching call and return nodes of p. Then 
we have: 

Varff) 2 GlobVar{fg{succ* {pi))) ^ 

{'di <l < j. ^Comp* A ^SameInCp[i,l[) 



10.3 Interprocedural Code Motion Transformations 

In essence, and quite similar to the intraprocedural case an interprocedural 
code motion transformation is characterized by a three-step procedure: 

1. Declare a new temporary h in the statically most deeply nested proce- 
dure of n containing a defining occurrence of a variable of Var(t), or in 
the main procedure of 77, if all variables of Varff) are free in 77. 

2. Insert assignments of the form h. :=t at some nodes in Range{t). 

3. Replace some of the original computations of t by h. 

Note that the first step is commonly shared by all interprocedural code mo- 
tion transformations. Thus, the specification of a specific interprocedural 
code motion transformation ICM can be completed by defining two predi- 
cates InserticM and Replace which denote the sets of program points 
where an initialization must be inserted and an original computation must 
be replaced. As in the intraprocedural setting, we assume that Replace 
implies Comp , and that the conjunction of Insert icm and Comp implies 
Replace jQ[^ . This avoids transformations keeping an original computation 
even after an insertion into the node it occurs in making it locally redun- 
dant. Obviously, this does not impose any restrictions to our approach. 



• Declare a new temporary h in procedure tt: tt is the statically most 
deeply nested procedure of 77 containing a defining occurrence of a 
variable of Varff), or the main procedure of 77, if all variables of 
Varft) are free in 77. 

• Insert at the entry of every node satisfying Insert icm the assignment 

h := t. 

• Replace every original computation of t in nodes satisfying 

Replace ICM 



Table 10.1: Scheme of interprocedural code motion transformations 
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Note that the static semantics of Prog requires that the predicate Insert icm 
obeys the following implication: 

Vn G N’^. Insert icM{n) n G Range(t) 

In the following we denote the set of all interprocedural code motion trans- 
formations with respect to t, i.e., the set of all transformations matching the 
scheme of Table 10.1, by ICM,. 

10.3.1 Admissible Transformations 

A code motion transformation ICM is admissible, if it preserves the seman- 
tics of its argument program. Intuitively, this requires that ICM is safe and 
correct. “Safe” means that there is no program path, on which the com- 
putation of a new value is introduced, and “correct” means that h always 
represents the same value as t, when it is replaced by h. This is reflected in 
the following definition, which defines when inserting and replacing a com- 
putation of t is interprocedurally safe and correct in a node n G N* . Ad- 
ditionally, it introduces the notion of strongly safe program points, which 
we are going to show reveals an important difference to the intraprocedural 
setting. 

Definition 10.3.1 (Safety and Correctness). 

For all nodes n G N* we define: 

1. Safety: Safe{n) -^^df 

Vp G IP[s*,e*] Vi. {pi =n) 

a) 3 j < i. Comp {pj) I\ SameInCp[j,i[/\ Transp* 'ipj)ip[jc[) V 
b)3j> i. Comp* (^p^^){pj) A Transp*'(p^.^){p[i, j[) 

2. Strong Safety: S-Safe{n) -^^df N-USafe* {n) V N-DSafe* {n), where 

a) N-USafe* (n) -^^df 

Vp G IP [s* , e*] Vi. (pi=n) 

3j < i. Comp (pj) A SameInCp[j,i[A Transp* Jpj'j{p[j,i[) 

b) N-DSafe* (n) ^^df 

Vp G IP[s*,e*] Vi. (pi =n) 

3j > i. Comp* (^p^i){pj) A Transp*Jp i){p[i,j[) 

3. Correctness: Let ICM &ICM. Then: 

CorrecticM{n) -^^df 

Vp G IP[s*, n] 3 i. Insert ICM {Pi) A SameInCp[i, Ap[ A Transp*'^p^i'^{p[i, Ap[) 

Obviously, we have: 

Lemma 10.3.1. 

Vn G N* . Safe{n) n G Range(t) 

By means of the predicates of safety and correctness we can now define the 
set of admissible interprocedural code motion transformations. 
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Definition 10.3.2 (Admissible Interprocedural Code Motion). 

An interprocedural code motion transformation ICM G JCM is admissible 
if and only if every node n G N'^ satisfies the following two conditions: 

1. Insert ICM (n) ^ Safe{n) 

2. ReplaceicM{n) CorrecticMin) 

We denote the set of all admissible interprocedural code motion transforma- 
tions by JCAiAdm- 

For admissible code motion transformations we can prove: 

Lemma 10.3.2 (Correctness Lemma). 

y ICM GTCM-Adm Vn G N* . CorrecticMin) ^ Safefn) 



Proof. Let ICM G ICMAdm, and n G N* satisfying CorrectjcM ■ Then 
the following sequence of implications proves Lemma 10.3.2, where the last 
one follows easily by means of Definition 10.3.1(1) and a case analysis on j: 



(Def. 10.3.1(3)) 
{ICMGiCMAdm) 
(Def. 10.3.1(1)) 



(Def. 10.3.1(1)) 



CorrecticM (n) 

y p G lP[s* ,e*]y k. pk = n. 3i < k. 

Insert icM{Pi) A SameInCp[i,k[ A Transp*Jp ,i-^{p[i,k[) 
y p G lP[s* ,e*]y k. pk = n. 3i < k. 

Safe{pi) A 5'ome/nCp[t, A Tronsp*'^p_j)(p[z, fc[) 
y p G IP[s*, e*] y k. pk = n. 3i < k. 

{3j < i. Comp ipj) A SamelnCplf, z[ A Transp*'^pj'^ {p[j, z[) 
V {3j > i. Comp\pi){pj) A Transp*\p,_){p[i,j[)) A 

SameInCp[i,k[ A Transp*'^^ ,i)(p[A^[)) 

Safe{n) 



□ 

Up to now, the analogy to the intraprocedural setting seems to be complete 
at first sight. However, there is an essential difference between safety and 
strong safety interprocedurally. Whereas intraprocedurally both properties 
are equivalent (cf. Lemma 3.2.1), interprocedurally only the (trivial) impli- 
cation posed by Lemma 10.3.3 holds. 

Lemma 10.3.3 (Interprocedural Safety). 



y n G N* . S-Safe{n) Safe{n) 
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Jr j : a, b, X 




Fig. 10.15. Safety does not imply strong safety 



The backward implication of Lemma 10.3.3 is in fact in general invalid as 
already previously illustrated by the program fragment of Figure 10.6. For 
convenience, Figure 10.15 recalls the essential part of this example. Note 
that all nodes of procedure 7T2 are safe, though none of them is up-safe or 
down-safe. 

Thus, the interprocedural version of the correctness lemma is weaker than 
its intraprocedural counterpart because intraprocedurally safety and strong 
safety coincide. In fact, the stronger implication 

CorrecticM{n) ^ S-Safe{n) 

is in general invalid. This can be proved by means of Figure 10.15, too. Note 
that the start node of procedure 7T2 is safe. After inserting an initialization 
at this node all nodes of 7T2 satisfy the predicate Correct without that any of 
these nodes is strongly safe. 

In the following we focus on canonic code motion transformations. Canon- 
icity is sufficient for the computational optimality of the ZBCM-transforma- 
tion. 

Definition 10.3.3 (Canonic Transformations). 

An admissible interprocedural code motion transformation ICM S ICAiAdm 
is canonic if and only if for every node n € N"® the following condition 
holds: 
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Insert icM{n) 

VpG IP [s*,e*] y k. pk=n.3k < i < Xp. SameInCp[k,i[ A Replace jQ;^{pi) A 



(V/c + 1 < j < i. Insert icmipj) ^SameInCp[k, j[) 

We denote the set of all canonic interprocedural code motion transformations 
by TCM. Can ■ A program resulting from a transformation of TCM. can will be 
called canonic, too. 

10.3.2 Computationally Optimal Transformations 

Technically, the intraprocedural criterion of computational optimality can 
easily be extended to the interprocedural setting: a code motion transfor- 
mation ICM G 'ICM.Adm is computationally better^ than a code motion 
transformation ICM' G TCAiAdm if and only if 

V p G IP[s*,e*]. I {i I CompjcMiPi)} I < I {* I Comp jcM'iPi)} I 

Note that Comp denotes a local predicate, which is true for nodes con- 
taining a computation of t after applying the code motion transformation it 
is annotated with, i.e.: 

Comp icM{C)=df Insert ICM (ji) V {Comp (n) A ^ Replace iQm{n)) 

This allows us to define: 

Definition 10.3.4 (Comp. Optimal Interprocedural Code Motion). 

An admissible interprocedural code motion transformation ICM G 'ICM.Adm 
is computationally optimal if and only if it is computationally better than 
any other admissible interprocedural code motion transformation. We denote 
the set of all computationally optimal interprocedural code motion transfor- 
mations by ICMcmpOpt- 

Intraprocedurally, ICMcmpOpt is never empty: each program has a compu- 
tationally (and even a lifetime) optimal counterpart (cf. Corollary 3.4.1). 
Moreover, canonicity is necessary for computational optimality (cf. Theorem 
3.2.1). As shown in Section 10.1.1, these theorems do not carry over to the 
interprocedural setting (cf. Theorem 10.1.1). We have: 

Theorem 10.3.1 (InterproceduralComp.OptimalityandCanonicity). 

1. There are programs, for which ICM cmpOpt is empty. 

2. In general, ICMcmpOpt is not a subset of ICMcan- 

^ Note that this relation is reflexive like its intraprocedural counterpart. 
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10.3.3 Lifetime Optimal Transformations 

In this section we extend the notion of lifetime optimality to the interpro- 
cedural setting. Central is the introduction of the interprocedural notion of 
a lifetime range. In essence, interprocedural lifetime ranges are like their in- 
traprocedural counterparts paths from insertion to replacement points. How- 
ever, we have to take different incarnations of terms into account: nodes where 
relatively to the insertion point n a different incarnation of the term under 
consideration is valid, do not belong to a lifetime range starting in n, and 
must be excluded from the underlying path. Technically, this is accomplished 
by defining the set of (maximal) segments of a path, which are characterized 
by the fact that at all nodes of the segments the same incarnation of t is 
valid. 

Definition 10.3.5 (Interprocedural Path Segments). 

We define: 

VpGlP[s*,e*] Vl<i<j<Ap. 

Segments (^^j^{p)=df {p[i' , j'] \{i<i' < j' < j) A 

{y i' < I < j' . SameInCp[ifi[) A 
{i'>i ^SameInCp[i,i' — \[) A 
{f <3 ^ -^Samelncp[i,j'+l[)} 

This allows us to introduce the definition of interprocedural lifetime ranges. 

Definition 10.3.6 (Interprocedural Lifetime Ranges). 

Let ICM € JCMAdm- We define: 

LtRg{ICM)=df 

{Segments \ p G IP[s*,e*] A 

^ Si i Si j < Ap. InserticM{Pi) A SameInCp[i, j[ A Replace jQj^d{pj) A 
<j. Insert ICM (pi) ^ ^SameInCp[i,l[} 

By means of interprocedural lifetime ranges we can now define the inter- 
procedural variant of the relation “lifetime better” . An interprocedural code 
motion transformation ICM G TCM.Adm is lifetime better than an interpro- 
cedural code motion transformation ICM' G LCAiAdm if and only if 

VP G LtRg{ICM) 3Q G LtRg{ICM'). P C Q 

where C denotes the inclusion relation on sets of paths (cf. Section 2.1.1). 
Summarizing, we define: 

Definition 10.3.7 (Lifetime Optimal Interprocedural Code Motion). 

A computationally optimal interprocedural code motion transformation ICM G 
PCAA cmpOpt is lifetime optimal if and only if it is lifetime better than any 
other computationally optimal interprocedural code motion transformation. 
We denote the set of all lifetime optimal interprocedural code motion trans- 
formations by TCM. LtOpt ■ 
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Obviously, we have (cf. Theorem 10.3.1(1)): 

Theorem 10.3.2 (Interprocedural Lifetime Optimality). 

There are programs, for which ICA4 LtOpt is empty. 

This is in contrast to the intraprocedural case. However, the following 
facts carry over to the interprocedural setting: if TCM.CmpOpt then 

there is usually more than one computationally optimal transformation, i.e., 
\TCM.CmpOpt I > 1- Lifetime optimality, however, can at most be achieved 
by a single transformation (cf. Section 3.2.3). Note that the corresponding 
theorem relies only on properties of lifetime ranges and of computational 
optimality. Canonicity is not required. 

Theorem 10.3.3 (Uniqueness of Lifetime Optimal Interproc. CM). 

\JCM LtOpt I < 1 

Proof. Let ICM,ICM' € ICMLtOpt- Then we have to prove two equiva- 
lences: 

y n G . Insert icmin) Insert jcM'in) (10-1) 

y n G . Replace Replace jQj^f>{n) (10-2) 

For symmetry reasons it is sufficient to prove only one direction of these 
equivalences. Starting with (10.1), the computational optimality of ICM 
guarantees 

Vn G N"®. Insert ICM {n) 

3 m G N'^ 3p G IP[s*, e*] 3 i,j. 

Pi = n A pj=m A Segments G LtRg (ICM) 

Moreover, the lifetime optimality of ICM yields that there are indexes i' 
and j' with 

i' <i<j< f 

and 

Segments C Segments j,-^{p) G LtRg{ICM') 

Suppose i' < i. Since ICM' is also lifetime optimal, there must be indexes 
i" and j" with 

i" <i' <i < j < f < j" 

and 

Segments C Segments j,-^{p) C Segments j,,-^{p) G LtRg (ICM) 

In particular, we have 



Segmentsf^^ j-^{p), Segments jn'^{p) G LtRg (ICM) 
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and 



Segments Q Segments 



Hence, Definition 10.3.6 delivers 



^Insert icM{n) 

This, however, is a contradiction to the premise of the implication under 
consideration. Thus, we have as desired Insert icM'{n)- 

In order to prove (10.2), let n G N'^. Then (10.2) is a consequence of the 
following chain of implications: 

Replace jQj^^(n) 

{ICM Comp {n) f\ Correct ic min) 

(10.1) Comp{n) /\ Correct icM'{n) 

(^ICM G CnipOpt) Replace {ri) 



□ 

Continuing the analogy to the intraprocedural setting, we introduce next the 
notion of interprocedural first-use-lifetime ranges , which is important for the 
optimality proofs of the IBCM- and /LCM-transformation. 

Definition 10.3.8 (Interprocedural First-Use-Lifetime Ranges). 

Let ICM GTCMAdm- We define 

FU-LtRg{ICM)=df {P G LtRg{ICM)\\fQ G LtRg{ICM).Q C P=^Q = P} 
We have: 

Lemma 10.3.4 (Interprocedural First-Use-Lifetime Range Lemma). 

Let ICM G 2CM.Adm, P G IP[s*,e*], and Qi,Q 2 G FU-LtRg{ICM) with 
Qi Cp and Q 2 E P- Then either 

- Qi=Q 2 or 

— Qi and Q 2 are disjoint, i.e., they do not have any node occurrence in 
common. 



10.4 The /filCM-Transformation 

In this section we present the interprocedural version of the busy code motion 
transformation called IBCM -transformation. Like its intraprocedural coun- 
terpart, it is based on the properties of down-safety and earliestness. We will 
prove that it is always admissible, and that it is computationally optimal, 
whenever it is canonic for the program under consideration. 
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10.4.1 Specification 

Intuitively, a node n is interprocedurally down-safe at its entry, if on ev- 
ery terminating program path starting in n the first modification of t is 
preceded by a computation of t. Analogously, it is interprocedurally down- 
safe at its exit, if on every terminating path starting in a successor of n 
the first modification of t is preceded by a computation of t. Note that the 
definition of the corresponding intraprocedural property relies on the same 
intuition. However, the definition of interprocedural down-safety must be re- 
fined in order take care of the fact that there are different incarnations of t 
(cf. Definition 3.3.1). 

Definition 10.4.1 (Interprocedural Down-Safety). 

Let n € N* . n is interprocedurally 

1. entry-down-safe [in signs: N-DSafe* {n)] -^^df 

Vp G IP[s*,e*] y i. pi=n ^ 

< j < Ap. Comp* (^p i){pj) A Transp*Jp^^^{p[i,j[) 

2. exit-down-safe [ in signs: X-DSafe* (n) ] -^^df 

Vp G IP[s*,e*] y i. pi=n ^ 

< Ap. Comp{pj) A SameInCp]i,j[ A Tronsp*^^ j_|_]^)(p]f, j[) 



Note that the program point, which corresponds to the entry of a return node 
n of G* in S, is the entry point of the unique successor of n. This implies 
that the entry of a return node of G* must not be an insertion point of any 
interprocedural code motion transformation. This is automatically taken care 
of by the IDFA-algorithms of the /BCM-transformation. The specification 
here, however, requires a slightly modified version of the predicate N-DSafe* , 
denoted by I-DSafe. A node n G N* is interprocedurally down-safe, if it 
satisfies the predicate I-DSafe defined by 



V n G N* . I-DSafe {n)=df 



N-DSafe* (n) if n ^ Nf 

N-DSafe* \pred*{n)) otherwise 



Considering a return node n, the predicates I-DSafe and N-DSafe* coincide 
if all “siblings” of n have the same truth value. Using I-DSafe instead of 
N-DSafe* guarantees that return nodes never satisfy the predicate “earliest” , 
and therefore do not occur as insertion points of the /BCM-transformation. 



Definition 10.4.2 (Interprocedural Earliestness). 

Let n G Range(t). n is interprocedurally 

1. entry-earliest [in signs: N- Earliest* (ji)] -^^df 

3pGlP[s*,n] VI < i<Xp. I-DSafe {pi) A SameInCp[i, Xp[ 

-^Transp*'lp^,^{p[i, Xp[) 

2. exit-earliest [ in signs: X-Earliest* (n) ] -^^df 

3p G IP[s*, n] V 1 < z < Ap. I-DSafe (pi) A SameInCp[i, Xp] 

~^Transp*'(p^,){p[i, Xp]) 
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Abbreviating N-Earliest* by I-Earliest , Lemma 10.4.1 yields that the 
/iJCM-transformation does not insert a computation into a return node or 
into the fictions procedure Gq representing the external procedures of the 
program under consideration. This lemma follows immediately from the defi- 
nition of the predicate I-DSafe and the local abstract semantics of end nodes 
and return nodes being a successor of the end node of Gq.^ A dual lemma 
holds for the /LCM-transformation. For the /BCM-transformation it is as 
follows: 

Lemma 10.4.1. VnG N* . I-DSafe{n) A I-Earliest (n) 

n€ N* \{N* U {so, no, eo}) 

Table 10.2 now presents the definitions of the predicates Insert ibcm and 
ReplacejBQMj which define the /iJCM-transformation. 



• y n G . InsertiBCM{n)=df I-DSafe{n) A I-Earliest {n) 

• Vn G N'^. ReplacejBCMi''^)=df Comp{n) 



Table 10.2: The /BCM-transformation 

The /iJCM-transformation is admissible. We prove this property next, though 
it requires to anticipate the second part of the /iJCM-Lemma 10.4.2. How- 
ever, this does not cause any subtleties as it is independent of the admissibility 
theorem. We have: 

Theorem 10.4.1 (IBCM- Admissibility). 

IBCM G ICMAdm 

Proof. In order to prove IBCM G 2CM.Adm, we must show that all insertions 
and replacements are safe and correct, respectively. The safety of insertions 
is a consequence of: 



(Def. InsertiBCM) 
(Def. I-DSafe) 
(Def. S-Safe) 
(Lemma 10.3.3) 



InsertiBCM 

I-DSafe 

N-DSafe* 

S-Safe 

Safe 



The correctness of replacements follows from: 



Recall that nodes e G {eo, ei, . . . , efc} represent the empty statement “skip”. 



3 
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(Def. Replace ibcm) 
(Def. N-DSafe * ) 
(Def. S-Safe) 
(Lemma 10.3.3) 
(Lemma 10.4.2(2)) 



Replace 

Comp 

N-DSafe* 

S-Safe 

Safe 

CorrectwcM 



□ 

Reestablishing Formal Procedure Calls. In this section we discuss how 
to reestablish the formal procedure calls in S after the /BCM-transformation, 
a step, which is necessary because formal procedure calls have been unfolded 
before the transformation (cf. Section 10.2). We illustrate this procedure by 
means of Figure 10.14. In order to fix the setting, let M=df {n{Tr) \tt G 
callee (n) } be the set of nodes introduced for replacing a formal procedure 
call node n, and let M'=df { n'(7r) | tt G callee{n) } be the set of immediate 
successors of nodes of M. According to the splitting of edges leading to nodes 
with more than one predecessor, all nodes of M' are synthetic and represent 
the empty statement “skip”. In particular, for every tt G callee{n), we assume 
that nfir) denotes the unique successor of n^ir), and n(7r) the unique prede- 
cessor of n'{n). Moreover, we have \predfg(^M){M) \ = \ sucCfg(M’){M') \ = 1. 
In this situation the formal procedure call is reestablished as displayed in 
Figure 10.16: the sets M and M' of nodes are replaced by the original 
node n and a new node n', respectively. Correspondingly, the set of edges 
reaching a node in M is replaced by a single edge from the unique prede- 
cessor p of the nodes of M to n, and the set of edges leaving a node of 
M is replaced by a single edge from n to n' . Moreover, the set of edges 
leaving a node of M' is replaced by a single edge from n' to the unique 
successor s of the nodes of M' (cf. Figure 10.16(a)). Having settled the 
graph structure, we are left with fixing the predicates Insert and Replace 
for n and n' with respect to the predicate values applying to nodes of 
M and M' . Starting with the predicate Insert, this is straightforward, if 
all or if none of the nodes of M and M' , respectively, satisfy the pred- 
icate InsertiBCM, i-e., if the following equalities | { Insert ibcm ( n^ir)) \ tt G 
callee{n) } | = | { Insert ibcm { n' (tt)) \ tt G callee{n) } | = 1 hold. In these cases 
we define 

{ n Insert iBCM{n{Tr)) if m = n 

ttG calleein) 

n Insert ibcm in' {tt)) if m = n' 

ttG callee{n) 

The remaining cases are slightly more complicated: 

If \{InsertiBCM{n{Tr))\TT G callee{n)} \ = 2, a new edge must be intro- 
duced between n and its unique predecessor p, which is split by a new 
node h as illustrated in Figure 10.16(b). In this case, Insert iBCM{n) is set 
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to false and Insert iBCM{n) to true. Note that the new branch in p is 
deterministic, i.e., the branch to h is taken if and only if the argument pro- 
cedure bound to the parameter procedure of node n is an element of { tt G 
callee{n) \ Insert iBCM{n{Tr))} ■ Analogously, a new branch must be inserted 
at the end of node n' in case of | { Insert ibcm {n' (tt)) \ tt G callee{n) } | = 2. 




Fig. 10.16. Reestablishing formal procedure calls after the /BCM-transf. 



Finally, the predicates Replace jbcm{'^) Replace jb cm have to be 

set as follows:^ 

{ n Replace jbcm if nr = n 

TT£callee{n) 

false if m = n' 

Note that the number of computations on a program path of IP[s*,e*] 
is invariant under the restoration of formal procedure calls. Without loss of 
generality we can thus prove the computational optimality of the IBCM- 
transformation before their reconstruction. 

Checking Canonicity. The /iJCM-transformation is always admissible (cf. 
Theorem 10.4.1). Under the premise of canonicity it is also computationally 
optimal; a condition, which can easily be checked by investigating the in- 
sertion points. For a specific insertion this can be accomplished in time lin- 
ear to the number of nodes of the program by a simple marking algorithm. 



^ Note that n' represents the empty statement “skip” , which implies -^Compin'). 
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which, starting at the insertion point under consideration, recursively marks 
all nodes reachable from it until it either stops at a use site or a reinsertion of 
the computation under consideration of the same incarnation. The insertion 
is canonic, if the marking stops at use sites only. Note that even in the pres- 
ence of recursive procedure calls the algorithm does not have to visit nodes 
twice: if the term contains a variable being local to the callee, the call can 
simply be skipped as entering the callee activates a new incarnation of the 
term. Thus, as a consequence of the down-safety of the insertion point under 
consideration, we obtain that the use site justifying down-safety follows the 
recursive call. Alternatively and more efficiently, canonicity of the IBCM- 
transformation can also be checked in the style of the “unusability” analysis 
of Section 10.5.1 using the dual version of this property. 



10.4.2 Proving Computational Optimality 

In this section we assume that the /iJCM-transformation is canonic for the 
program under consideration. Under this premise, we will prove that it is 
interprocedurally computationally optimal (cf. /iJCM-Optimality Theorem 
10.4.2). In essence, this can be proved as in the intraprocedural case. Central 
is the IBCM-Lexmaa, 10.4.2. Note that only its third part, which intuitively 
states that every insertion of the /BCM-insertion is used on every program 
continuation without a preceding modification or reinsertion, relies on the 
canonicity of the /iJCM-transformation. This, in fact, is crucial for the proof 
of computational optimality. 

Lemma 10.4.2 (75 CM- Lemma). 

1. VnS N'^. Insert iBCMin) S-Safe{n) A 

n -^Transp{mc) if n e {si, . . . , s^} 

mG caller{fg{n)) 

{^GlobTransp{m) V ^Safe{m)) 

mGpredfg^n) (") 

if 0 yf predfg(n){n) C N'^\Nf V 
' {predfg(^„){n) = {m} C Nf A 

Varit) % Glob Var {callee (m))) 
{^GlobTransp{m) V ^Safe{m)) A 

-^Safe{end{callee{m))) if pre<iyg(„)(n) = {m} C Nf A 

Var{t) C Glob Var {callee {m)) 

2. VnSN'^. GorrectiBCM{n) Safe{n) 

3. If IBGM GTCMcan, then we have: 

Vp G IP[s*,e*] Vz. Insert iBCM{Pi) ^ 

3j > i. Segments (^i j'^{p) G FU-LtRg {IBGM) 

4- V/CMGXCMAdmVpGlP[s*, e*](Vz, j. Segments (^^ j'^{p) G LtRg{IBGM)). 
-^ReplaccjcMiPj) V 3i <l < j. Insert icm{pi) A {pi) C Segments^^j^{p) 
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Proof. Part 1). Starting with the proof of the first implication, we have 
that Insert ibcm implies I-DSafe, and I-DSafe implies S-Safe. Thus, we 
are left with showing the implication 

Insert ibcm (n) 

n -^Transp{mc) if n G {si, . . . , s^} 

mG caller {fg{n)) 

n {^GlobTransp{m) V ^Safe{m)) 

m.epredfg(„) (n) 

if 0 7^ pred/g(„)(n) C N'^\Nf V 

(pred/g(„)(n) = {m} C Nf A (10-3) 

Var{t) % GlobVar{callee{m))) 
{^GlobTransp{m) V -^Safe{m)) A 
-^Safe{end{callee{rn))) 

if predfg(n){n) = {m} C Nf A 
Var(t) C GlobVar{caUee{m)) 

As the main procedure cannot be called, we have caller {fg{s\)) = %. This 
directly implies the validity of implication (10.3) for n = si. Next, let n G 
{s 2 , . . . , Sfc}, and assume that 

^ Transp (me) 

mG caller {fg{n)) 

is invalid. As all procedure calls in S are ordinary, this assumption implies 

Transp (me) 

caller (fg{n)) 

which yields 

VpGiP [s*,n]. SameInCp[Xp — 1, Ap[ 

Combining this with I-DSafe (n) we get 

I-DSafe (to) 

caller {fg{n)) 



and therefore by means of Definition 10.4.2 ^I-Earliest(n), which is a con- 
tradiction to Insert ibcm (n). 

We are now left with proving implication (10.3) for n G N‘®\{si, . . . ,Sfc}. 
First, we show that n has a unique predecessor, which is accomplished by 
proving the contrapositive of 

Vn G N'^\{si, ... ,Sfe}. Insert /scM(n) ^ | pred/g(„)(n) | = 1 (10.4) 

Thus, let n G N‘®\{si, . . . , s^}, and suppose that | predfg(^„){n) \ > 2. Lemma 
10.2.2(la) and Corollary 10.2.1(1) then yield 



succfg(n) {predfg(n) (n)) = {n} A n Transp (to) 

"i6P»’eAg(n) (") 
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Moreover, we have 



Vp G IP[s*,n]. SameInCp[\p — 1, Ap[ 

Hence, I-DSafe{n) implies 

I-DSafe{m) (10.5) 

mGpredfg(Ti) (n) 

By means of (10.5), Definition 10.4.2 yields 

-^I-Earliest (n) 

This directly implies ^Insert ibcm (n) as desired. Thus, n has a unique pre- 
decessor, which we denote by m. 

In order to complete the proof of implication (10.3), we must investigate 
three cases. Without loss of generality we can assume that m satisfies the 
predicate GlobTransp{m) . 

Case 1. m G N^\Nf 

In this case GlobTransp{m) is equivalent to Transp{m). Moreover, for all 
p G IP[s*,n] we have SameInCp[\p — l,Ap[. Suppose that m satisfies the 
predicate Safe. According to Definition 10.3.1 this implies 

Vp G IP[s*,e*] Vi. (pi = m) 

a) 3j < i. Gomp (pj) A SameInCp[j,i[/\ Transp* V (10.6) 

b) 3j > i. Gomp\p^i){pj) ATransp*\p^.j_){p[i,j\) 

The proof proceeds now by showing that I-Earliest (n) does not hold, which 
as desired yields a contradiction to Insert ibcmA)- This can be achieved by 
considering each path passing node n separately. Thus, let p G IP[s*,e*] 
and let i G {l,...,Ap} with pi=n. If there is an index j < i satisfying 
condition a) of (10.6), i.e., in case of a) 

3j < i. Gomp (pj) A SameInCp[j,i[A Transp*Jp j'^{p[j,i[) 

we immediately get the equivalent formula 

3j < i. I-DSafe{pj) A SameInCp[j,i[A Transp*'^pj'^{p[j,i[) 

According to Definition 10.4.2, p cannot be used for justifying the predicate 
I-Earliest (n). Thus, we are left with the case of condition b), i.e., 

3j > i. Gomp* (^p i){pj) A Transp*'(p^,-){p[i,j[) 

Without loss of generality we can assume 

yi <i. Gomp{pi) ^{SameInCp[l,i[A Transp* Mip[l,i[)) (10.7) 

because otherwise we succeed as above. Moreover, without loss of generality 
we can additionally assume that 
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yi <i. I-DSafe{pi) ^{SameInCp[l,i[A Transp*Jp (10.8) 

because the existence of an index I violating (10.8) would exclude according to 
Definition 10.4.2 the usage of p for justifying I-Earliest{n). Together, (10.7) 
and (10.8), however, imply the existence of a path p passing node m violating 
both constraint a) and b) of (10.6). This, however, is a contradiction to the 
safety of m, which finishes the proof of Case 1. 

Case 2. m e Nf A Var{t) % GlobVar{caUee{m)) 

In this case Lemma 10.2.6(2) yields for all paths p G IP[s*,n] with 

p[i,Xp — 2] G ClP[start{callee{m)), end{callee{m))] 

that the predicate SameInCp[i—l, Ap[ is satisfied. Additionally, GlobTransp{m) 
implies 

Transp*\^^i_^){p[i - l,Ap[) 

Thus, as in Case 1, the assumption that m satisfies the predicate Safe, im- 
plies ^I-Earliest{n), and therefore a contradiction to Insert iBCM{n), which 
completes the proof of Case 2. 

Case 3. m G Nf A Var{t) C GlobVar{callee{m)) 

Here we have to prove the conjunction 

-^Safe{m) A ^Safe{end{callee{rn))) 

Clearly, ^Safe{m) holds for the same reasons as in Case 2. Thus, we are left 
with showing ^Safe{end{callee{m))) . Suppose that end{callee{m)) satisfies 
the predicate Safe. This yields 

Vp G IP[s*, e*] Vz. {pi = end{callee{m))) 

a) < z. Gomp{pj) ASameInCp[j,i[A Transp* \p,jMdA) V (10.9) 

b) 3j> i. Gomp\p^i){pj) A Transp*\p^.j_){p[i,j\} 

and the proof can now be completed as in Case 1 using end{callee{m)) 
instead of m. 

The converse implication, “4=”, is proved as follows. If zz = Si, S-Safe{n) 
is equivalent to N-DSafe* (n). This directly implies I-DSafe{n) and t G 
Rangein). Moreover, Definition 10.4.2 yields N-Earliest* (n), which is equiv- 
alent to I-Earliest{n). Hence, we obtain as desired 

Insert ibcm A) 

Let now n G {s 2 , . . . , s^}. Then 

^Transp (me) (10.10) 

mG caller{fg{n)) 



yields 



N-DSafe* (zz) 
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Since n € {s 2 , . . . , s^} this is equivalent to I-DSafe (n). Moreover, this yields 
t G Range{n). Thus, we are left with showing that I-Earliest{n) holds, too. 
Clearly, (10.10) yields 

VpGlP[s* ,n] \/i<Xp. ^SameInCp[i, Xp[ 

Hence, applying Definition 10.4.2(1) we get as desired I-Earliest{n). 

We are now left with the case n G N‘®\{si, . . . ,Sfc}. Similar to the first 
implication, we prove that n has exactly one predecessor. This is done by 
showing the contrapositive of 

Vn G N'^\{si, . . . ,Sfc}. 

S-Safe{n) A J([ {^GlobTransp{m) V ^Safe{m)) i,'. 

mGpred{g^„~i (n) \ ' ) 

^ |pred/g(„)(n) | =1 

To this end let n G N‘®\{si, . . . , s^}, and suppose that 

|pred/g(„)(n) I >2 

Lemma 10. 2.2 (la) and Corollary 10.2.1(1) then yield 

succfg(^n){predfg(n){n)) = {n} A n Transp (m) 

m£predf^(n)(n) 

In particular, we therefore have 

GlobTransp{m) 

mGpredfg(^) (n) 

Without loss of generality we can assume (recall that n satisfies S-Safe) 

N- ESafe* (n) A ^ N- ESafe* (n) (10.12) 

because N-DSafe* (n) and the transparency of n’s predecessors yield 

S-Safe{m) 

mGpred{g(^n) (‘^) 

This implies 

Safe{m) 

mGpredfg(^ri) (‘^) 

and therefore as desired the negation of the premise of (10.11). (10.12) now 
implies 

N-USafe* (m) V Gomp{m) 

mGpredfg(^rt) (‘^) 
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Clearly, Comp implies N-DSafe* . Hence, we obtain 

N-USafe* (m) V N-DSafe* (to) 

m^predig(„^ (n) 



and therefore 



implying 



S-Safe{m) 

m^predig^n^ (n) 

Safe{m) 

m^predfg(^n) ('^) 



as well. This yields as desired the negation of the premise of (10.11). Hence, n 
has a unique predecessor, which we denote by to. Next, we must investigate 
the following three cases. 

Case 1. TO € N^\Nf 

In this case the predicates GlobTransp and Transp are equivalent for to. 
Additionally, we have: 



VpSlP[s*,n]. 5'aTOe/ncp[Ap — 1, Ap[ (10.13) 

Moreover, we also have 



N-USafe* (n) V N-DSafe* (n) 



because n satisfies the predicate S-Safe. Suppose now that n satisfies 
N- USafe* . Applying the definition of N- USafe* we obtain 



VpG IP[s*,n] 3j<Xp. 

Comp{pj) A b'ame/ncp [j, Ap [ A Tronsp*^p Ap[) 



(10.14) 



Together with (10.13) this implies Transp (to). Thus, by means of the premise 
of “<j=” we get 

-^Safe{m) (10.15) 

In particular, this yields ^Comp{m). Hence, (10.14) is equivalent to 



Vp G IP[s*, to] 

3 j < Ap — 1. Comp (pj) A SameInCp[j, Ap — 1[ A Transp*Jpj'^{p[j, Ap — 2]) 



which directly yields 



N-USafe* (to) 



and hence 



Safe{m) 



This, however, is a contradiction to (10.15). Thus, we have N-DSafe* (n), and 
therefore also TDSafe (n) . Hence, we are left with showing I-Earliest (n) . If 
Transp (m) is invalid, I-Earliest (n) holds trivially by means of (10.13) and 
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Definition 10.4.2(1). Otherwise, i.e., if Transp{m) and ^Safe{m) hold, the 
assumption ^I-Earliest{n) implies N-USafe* (m). This, however, yields a 
contradiction to the premise ^Safe{m). Together, this implies Insert ibcm(ji) 
as desired. 

Case 2. m G Nf A Var{t) % GlobVar{caUee{m)) 

Suppose that S-Safe{n) does not imply N-DSafe* (n), i.e., suppose we have 
N-USafe*{n) A ^N-DSafe* (n) 

Applying the definition of N-USafe* and Lemma 10.2.6(2) we obtain 
VpGlP[s*,n]3j<z'. 

Comp{pj) A SameInCp[j,Xp[ A Transp*Jpj~^{p[j,Xp[) ^ ' 

where pi’ is the matching call node to pvp-i on p. Together with Lemma 
10.2.7 this delivers 

N-USafe* {pi') V Comp{pii) 

Since Comp implies N-DSafe* we obtain 

N-USafe* {pi') V N-DSafe* {pi') 

Thus, we have 

S-Safe{pi') 

and therefore 

S-Safe{m) 

which implies 

Safe{m) 

Moreover, (10.16) yields directly 

GlobTransp{m) 

This, however, is a contradiction to the premise of . Thus, we have 
N-DSafe* (n), and therefore also I-DSafe{n). Now, I-Earliest{n) remains 
to be verified. If GlobTransp{m) does not hold, I-Earliest{n) holds triv- 
ially according to Definition 10.4.2(1). Otherwise, i.e., if GlobTransp{m) and 
-^Safe{m) hold, the assumption ^I-Earliest{n) directly yields N-USafe* (jn), 
and therefore a contradiction to the premise ^Safe{m). Thus, we obtain 
Insert iBCM{n) as desired. 

Case 3. m G Nf A Var{t) C GlobVar{callee{m)) 

Similar to Case 2, the assumption 

N-USafe* (n) A ^N-DSafe* (n) 

leads to a contradiction to the premise that end{callee{m)) does not sat- 
isfy the predicate Safe. Thus, we have N-DSafe* (n), and therefore also 
I-DSafe (n) . Additionally, if 

-^Safe{end{caUee{rn))) A ^GlobTransp{m) 
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is satisfied, Definition 10.4.2(1) yields directly I-Earliest {n) . Otherwise, i.e., 
if 

-^Safe{end{callee{m))) A ^Safe{m) 

holds, the assumption ^I-Earliest {n) implies N-USafe* {end{callee{m))), 
and therefore a contradiction to the premise ^Safe{erid{callee{rn))). Hence, 
in both cases we obtain as desired Insert iBCM{n). 

Part 2) . The first implication, “=J>” , holds by means of the Correctness Lemma 
10.3.2. Thus, we are left with showing the second implication, “4=”, 

Vn G N'^. Safe{n) Correct w cm ( n) (10.17) 

This is equivalent to 

( Vp G IP [s* , e*] V z. (pi=n) 

a) 3j < i. Comp{pj) A Samelncp[j,i[l\ Transp*1pj^{p[j,i[) V 

b) 3j >i. Comp*(^p^i){pj)ATransp*'(p^^){p[i,j[)) 

31 <i. InsertiBCM{pi) A SameInCp[l,i[A Transp*'^p i'^{p[l,i[) 



Implication (10.18) is now proved by investigating every path p separately 
by a case analysis on the size of j. If there is an index j satisfying condition 
a) of (10.18), condition a) is because of the fact that Comp implies I-DSafe 
equivalent to 

3j < i. I-DSafe{pj) A SameInCp[jC[A Tronsp*^p (p[j, z[) (10.19) 

Thus, the index Ip 

lp=df Min{{ j I I-DSafe (pj) A SameInCp[j, i[A Transp*'(pj^ (p[j, z[)}) (10.20) 
is well-defined. Together with Definition 10.4.2(1) we then obtain 
I-DSafe {pi^) A I-Earliest {pi^) 

and therefore 

Insert ibcm{pIp) (10.21) 

Combining (10.20) and (10.21), we have: 

Insert ibcm{pIp) A SameInCp[lp, Ap[A Transp*Jp i^^{p[lp,i[) 

Hence, Correct iBCM{n) holds on p as desired. 

If there is no index j satisfying condition a) of (10.18), but condition b), 
we obtain 

I-DSafe (pj) 
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and hence, as in the previous case, the index Ip 
lp=df Min{{ j I I-DSafe (pj) A SameInCp[j, A Transp*Jpj-j (p[j, i[)}) (10.22) 
is well-defined for path p. In particular, we have 

Ip < i 

because lp>i yields a contradiction to the safety of node n. Thus, the proof 
can now be completed as in Case 1. 

Part 3). Let p G IP[s*, e*] and i G {1, . . . , Ap} with Insert wCMiPi)- Clearly, 
we have pi G N*\N*. Moreover, we also have I-DSafe{pi), and therefore 
N-DSafe* (pi) as well. This yields 

3i < j < Xp. CoTnp*^p,){pj) A Transp*Jp^i){p[i,j[) (10.23) 



In particular, 

jp=dfMin{{j\i<j<Xp.CoTnp*(^pi){pj) A Transp*Jpi-){p[i,j[)}) 
is well-defined. Hence, the canonic! ty of IB CM yields 

\/i<l < jp. SameInCp[i,l[ ^ ^Insert ibcm{pi) (10.24) 

and therefore 

Segments G FU-LtRg{IBCM) 

as desired. 

Part 4). Let ICM G JCMAdm, P G IP[s*,e*], and Segments G 
LtRg{IBCM). Without loss of generality we can assume that the original 
computation in pj is replaced, i.e.. Replace jQ;^{pj) holds. The admissibility 
of ICM then guarantees 

CorrecticM{Pj) (10.25) 



Moreover, Segments G LtRg (IBCM) yields Insert ibcm (Pi) ■ Firas,hy 

means of Lemma 10.4.2(1) we have 

n ^Transpimc) if p* G {si, . . . , s^} 

mG caller {fg{jpi)) 

n {^ClobTransp{m) V ^Safe{m)) 

mepredfg(pB(pi) 

if 0 yf predfg(p.){pi) C N'^\Nf V 
' (predfg(^p.){pi) = {m} C Nf A 

Varit) % ClobVar{callee{m))) 
{^ClobTransp{m) V ^Safe{m)) A 

-^Safe{end{callee{m))) if predfg(^p.-^{pi) = {m} C Nf A 

Var{t) C ClobVar{callee{m)) 
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The Correctness Lemma 10.3.2 now implies 

n (^Tronsp (mc)V^C'orrec</cM(TOc))if Ke{so,Si, . . . ,Sfc} 

mG caller{fg{pi)) 

{^GlobTransp{m) V ^CorrecticM(jn)) 

mepredf:g^p.-)(pi) 

if 0 7^ predfg(p^){pi) C N-®\Nf V 
' {P'redfg(p^){p{) = {to} C Nf a 

Var{t) % GlobVar{callee{m))) 

{^GlobTransp{m) V ^GorrecticM{iTi)) A 

^GorrecticM{end{callee{m))) if predfg(^p.'^{pi) = {m} Nf A 

Var(t) C GlobVar{caUee{m)) 



This yields 



-^GorrecticM{Pi) V Insert icm{Pi) 



A straightforward induction now delivers the validity of the formula 
<l <j- InserticMivi) A {pi) C Segments 
which completes the proof of part 4. □ 

Before proving the main result of this section, we demonstrate that canonic- 
ity is indeed essential for the validity of the third part of the IBGM -Lemma, 
10.4.2. Tho this end we consider the example of Figure 10.17. It shows the 
result of the /BCM-transformation for a program, whose original computa- 
tions of a -I- 6 have been replaced by h. Note that the insertion at node 6 is 
perfectly down-safe because of the original computations of a -I- 6 at the nodes 
8, 9, 14, and 15. Nonetheless, it is not used along the program continuation 
passing node 12 of procedure 7T2. This can easily be verified by considering 
the relevant lifetime ranges of this example, which, for convenience, are high- 
lighted. Note that the light-shadowed lifetime range prefix starting in node 
6 cannot be extended to a first-use-lifetime range because of the insertion at 
node 13. 

However, under the premise of canonicity of the /BCM-transformation, and 
thus by means of the IBGM-Lemma 10.4.2 we can prove the main result of 
this section: 

Theorem 10.4.2 (/B CM- Optimality). 

If the IBGM -transformation is canonic for the program under consideration, 
then it is computationally optimal, i.e.. 



IBGM e ICMcan ^ IBGM e ICMcmpOpt 
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Fig. 10.17. The impact of canonicity on part 3 of /BCM-Lemma 10.4.2 



Proof. The admissibility of the /iJCM-transformation is guaranteed by The- 
orem 10.4.1. Thus, we are left with showing its computational optimality, i.e., 
with proving IBCM G TCMcmpOpt- To this end let ICM G TCMAdm, and 
p G IP[s*,e*]. Then we have as desired: 



(Def. IBCM ) = 

(Canon., Lem. 10.3.4, 10.4.2(3)) = 

(Lem. 10.4.2(4)) < 



Comp jbcm(.p) 

I {i I Insert IBCM (Pi)} \ 

\{i\SegmentS(^^j^{p) 

G FU-LtRg(IBCM) } \ 

I {i I Insert ICM {Pi)} \ + 

I {i I Comp {pi) A -^ReplaceicmiPt)} I 
Comp ICM (.P) 

□ 



The /iJCM-transformation is interprocedurally computationally optimal, 
whenever it is canonic. The converse implication is in general invalid, i.e.. 
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canonicity is not necessary for the computational optimality of the IBCM- 
transformation. For a proof reconsider the program of Figure 10.5, which 
shows the result of the /iJCM-transformation for the program of Figure 10.4. 
It is computationally optimal, but not canonic. Thus, we have: 

Theorem 10.4.3 (Canonicity). 

Canonicity of the IB CM -transformation for the program under consideration 
is sufficient for its interprocedural computational optimality, hut not neces- 
sary. 

For the example of Figure 10.5 the /iJCM-transformation works perfectly 
without being canonic. In general, however, if the 75 CM -transformation fails 
the canonicity constraint, it violates strictness in the sense of [CLZ], i.e., 
profitability is in general not guaranteed. An extreme example is displayed 
in Figure 10.18. It shows a program, which is free of any partially redundant 
computation. Hence, it is computationally optimal. Figure 10.19 shows the 
result of the /BCM-transformation for this program. It inserts a computation 
inside the loop impairing the program dramatically. 



It, :a, b, x,y,z It j "3 It 4 




Fig. 10.18. The original program II a 



10.5 The /LC'M-Transformation 

As in the previous section, we assume that the /BCM-transformation is 
canonic for the program under consideration. According to the IBCM- 
Optimality Theorem 10.4.2 it is thus computationally optimal. However, like 
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Fig. 10.19. Failure of canonicity causes in general failure of strictness 



its intraprocedural counterpart, it does not take the lifetimes of temporaries 
into account. Lemma 10.5.2 yields that the lifetimes of temporaries intro- 
duced by the /iJCM-transformation are even maximal in this situation, a 
fact, which can be proved by means of Lemma 10.5.1. 

Lemma 10.5.1 (ZBCM-First-Use-Lifetime Range Lemma). 

yiCM e XCMcmpOpt Vp e IP[s*,e*] yi < Ap. 

Comp ju;^{pi) ^ < ^ < j. {pi) C Segments G FU-LtRg (IBCM) 

Proof. Let ICM G ICMcmpOpt, P G IP[s*,e*], and let I be an index with 

Comp jcMipi) 

Suppose that 

y i < i < j- (pi) E Segments (^i j-^(p) ^ FU-LtRg(IBCM) 

Then we obtain according to the premise of the canonicity of the IBCM- 
transformation the following sequence of inequations: 

|{f I Comp jcM(Pi)}\ 

(Lem. 10.3.4 & 10.4.2(4)) > \{i\SegmentS(^^ j'^(p) 

G FU-LtRg(IBCM)} \ + 1 

(Canon. & Lem. 10.3.4 & 10.4.2(3)) = | {i | Insert wcm (P i)} \ + 1 

> \{i\ Insert iBCM(Pt)}\ 

= \{i\CompjBCM(Pi)}\ 
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This, however, is a contradiction to the computational optimality of ICM 
completing the proof of Lemma 10.5.1. □ 

Using this lemma we can now prove: 

Lemma 10.5.2 (/iJUM-Lifetime Range Lemma). 

yiCM e ICMcmpOpt VP e LtRg{ICM) 3Q e LtRg{IBCM). PQQ 

Proof. Let ICM G ICMcmpOpt, P G IP[s*,e*], and Segments G 
LtRg{ICM). By means of Definition 10.3.6 we obtain 

yi<l<jyqG Segments {pi) Qq=> ~^InserticM{pi) 

Additionally, we also have Replace jgQ;^{pj). The /PUM-Lemma 10.4.2(4) 
thus yields 

yi<l<jyqe Segments {pi) E g ^ ^Insert ibcm{pi) 
Lemma 10.5.1 yields the existence of an index V with V < i and 

Segments C P' 

for some P' G FU-LtRg{IBCM). In particular, we therefore have 

Insert ibcm{pv) A VV G {V + 1, . . . , i}. {pr) ^ P' => -^Insert wcM{Pi') 
Summarizing, we have as desired 

Segments E Segments (q, j-^{p) G LtRg{IBCM) 

□ 



10.5.1 Specification 

In this section we present the interprocedural extension of the LUM-transfor- 
mation called /LUM-transformation. It enhances the /PUM-transformation 
by minimizing the lifetimes of temporaries. Like its intraprocedural counter- 
part, the /LUM-transformation is based on the properties of latestness and 
isolation. 

Definition 10.5.1 (Interprocedural Delayability) . 

Let n G N* . n is interprocedurally 

1. entry-delayable [in signs: N -Delay able* {n)] -^^df 

VpGiP [s*,n] 31 < z < Xp. Insert iBCM{Pi) /\ SameInCp[i, Xp[ A 

-^Comp*fp^,^{p[i,Xp[). 

2. exit-delayable [ in signs: X-Delayable* (rz) ] 

VpGiP [s*,n] 31 < z < Xp. Insert iBCM{Pi) X SameInCp[i, Xp] A 

^Comp*fp^,-){p[i,Xp]). 
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Interprocedural delayability induces the notion of interprocedurally latest 
computation points. Intuitively, these are the “maximally delayed” pro- 
gram points, which allow a computationally optimal placement of the pro- 
gram term under consideration. The definition requires the function Nodes : 
S V{N^), which maps a flow graph G to its node set enlarged by the node 
sets of flow graphs (indirectly) invoked by call statements of G. Formally, it 
is defined by: 

VG € S'. Nodes{G)=df Nq U IJ { Nodes {callee{n)) \ n € Nf n Nq } 

where Na=df{n G \fg{n) = G} denotes the set of nodes of the flow 
graph G. 

Definition 10.5.2 (Interprocedural Latestness). 

A node n G N"® is interprocedurally latest, if it satisfies the predicate 
I-Latest defined by 

I-Latest{n)=df N-Delayable* {n) A {Comp{n) V 

' - n N-Delayable* (m) if n G N^\Nf V (n G Nf A 

mGsitccfg(^) (n) 

Var{t) 2 GlobVar{callee{n )) ) 

^N-Delayable* {start {callee{n))) A 
, Required { start {callee{n))) otherwise 

where 

Required{start{callee{n)))=df Comp{m) 

Nodes {fg {start {callee{n)))) 

For nodes outside of Nf the definition of latestness is straightforward and 
coincides with its intraprocedural counterpart. For procedure call nodes, it 
needs some more explanation. 

If the term t under consideration contains local variables of the called 
procedure (i.e., Var{t) % GlobVar{callee{m))), the incarnation of t, which 
is valid before entering the procedure is invalid as long as the call is not 
finished because entering the procedure activates a new incarnation of t. 
Hence, the original computations of t having led to the insertions of the 
/iJGM -transformation, which are now responsible for the delayability of t at 
the entry of n, are located on program continuation parts starting at the exit 
of n. As a side-effect we thus obtain that the called procedure is transparent 
for t, i.e., it does not modify global variables of t, since otherwise the hoisting 
of t across the call by the /BGM-transformation would have been blocked. 
Together this implies that n blocks the sinking of t only if it is passed as 
a parameter, i.e., if Comp (n) holds. Thus, the call node can be treated like 
an ordinary node in this case. 
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In the remaining case, i.e., if all variables of t are global for the procedure 
called by n (i.e., Var{t) C GlobVar{callee{m))) , latestness of n depends on 
properties of the start node of the called procedure. The point here is that en- 
tering the procedure does not activate a new incarnation of t. The start node, 
therefore, takes the role of the successors of n in this case, which explains 
the term ^N-Delayable* {start {callee(ji))) occurring in the definition of lat- 
estness. However, this condition and entry-delayability of n are not sufficient 
in order to imply latestness of n. In addition, this requires that the value of 
t is required for capturing a use site of t inside the called procedure, i.e., 
before finishing the call. This requirement, whose equivalent is automatically 
satisfied for ordinary nodes and call nodes involving local variables, must 
explicitly be checked. This is reflected by the predicate Required indicating 
whether there is an occurrence of t inside the called procedure. Checking this 
predicate, which does not require a data flow analysis, is sufficient because 
of the postulated canonicity of the /BCM-transformation. 

After defining latestness, we introduce next the interprocedural version of 
the predicate “unusable”. Like its intraprocedural counterpart, it indicates, 
whether an initialization of h at a specific program point would be unus- 
able because of the lack of a terminating program continuation containing a 
further computation of the same incarnation of t without a preceding reini- 
tialization of the same incarnation of h. 

Definition 10.5.3 (Interprocedural Unusability). 

Let n € N* . n is interprocedurally 

1. entry-unusable [in signs: N-Unusable* {n)] 

Vp G IP[s*,e*] y i. pi=n ^ 

{yi<j<\p. Comp\p^i){pj) 

3* < ^ < J. I- Latest {pi) A SameLnCp[i,l[). 

2. exit-unusable [in signs: X- Unusable* {n)] -^^df 

Vp G IP[s*,e*] y i. pi=n ^ 

{yi<j < Xp. Comp{pj) A SameLnCp]i, j[ 

3i<l <j. L- Latest {pi) A SameLnCp]iJ[). 

Unusability allows us to identify isolated program points. Intuitively, a node 
n is interprocedurally isolated, if a computation inserted at its entry could 
only be used in n itself, i.e., for transferring the value to the statement of node 
n. In essence, interprocedural isolation is therefore given by interprocedural 
exit-unusability. However, similar to the definition of latestness procedure 
call nodes require special care. For a procedure call node unusability after the 
parameter transfer is decisive for the validity of the isolation property, i.e., 
exit-unusability at the node nc- This node, however, is not included in the 
set of nodes of the flow graph system, and hence, exit-unusability at nc is not 
computed as a part of the /MFP-solution. Fortunately, this information can 
easily be computed for every procedure call node after computing the LMFP- 
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solution of the underlying data flow problem. Important are the semantic 
functions computed in the preprocess of the fixed point approach: 

Vn e Nf. X-Unusable* {nc) = 

top(l start {callee{n)) ]*o| start {callee(ji)) |o| ur \* {X- Unusable* (n)))) 

In essence, the correctness of this step is a consequence of the Main Lemma 
8.4.3, and the precision of the /MFP-solution of the IDFA-algorithm for un- 
usability proved in the following chapter. Note also that the application of 
I start {callee{n)) ]* can actually be dropped here because of our assumption 
that start nodes of procedures represent the empty statement “skip”. The 
definition of interprocedural isolation is now as follows. 

Definition 10.5.4 (Interprocedural Isolation). 

A node n G N'^ is interprocedurally isolated, if it satisfies the predicate 
I-Isolated defined by 

{ X-Unusable* (n) if n G N^\Nf V (n G Nf A 
Var{t) % GlobVar{callee{n))) 

X-Unusable* (nc) otherwise 

By means of latestness and isolation we can now fix the insertion and replace- 
ment points of the /LCM-transformation. Table 10.3 shows the definition of 
the corresponding predicates InsertucM and Replace which specify 
the /LCM-transformation completely. 



• y n G . InsertiLCM{n)=df I-Latest{n) A ^I-Isolated{n) 

• Vn G N'^. Replacej]^Qfj{n)=df Comp{n) A 

I- Latest (n) A I-Isolated{n)) 



Table 10.3: The /LCM-transformation 

Reestablishing Formal Procedure Calls. In this section we discuss how 
to reestablish formal procedure calls after the /LCM-transformation. In 
essence, this can be done along the lines of Section 10.4.1. Thus, we only dis- 
cuss a special case here, which does not occur for the //?CM-transformation. 
This case is illustrated in Figure 10.20. It is characterized by the equa- 
tion \{ Replace jRQ]^{n{T:))\T: G callee{n)}\ =2, where { n(7r) | tt G 

callee (n) } is the set of nodes introduced for replacing the formal procedure 
call node n (cf. Section 10.2, Figure 10.14). As a consequence of the equality 
I { Replace \ tt G callee{n) } | = 2, we obtain 

Comp{nc{Tr)) 

'K^callee{n) 
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together with the existence of two procedures tt' and tt" in callee{n) such 
that n(7r') and n{Tr") satisfy ^Replaceji^QjJji{'K')) and Replace 
respectively. In particular, this means that n(7r') satisfies the conjunction 
I-Latest A I-Isolated, whereas n^ir") does not. According to the construction 
of S we have |pred/g(„)(M) | =1 (cf. Section 10.2, Figure 10.12). Together 
with Comp{nc{Tr)) and /-Lotest (n(7r')), this yields 

'K^callee{n) 

I- Latest {n{Tr)) 

7T^callee{n) 

Thus, n(7r") satisfies the predicate /-Lotest , but not the predicate I-Isolated. 
Consequently, node n is not isolated after coalescing the nodes of M to n. 
Therefore, we define 

Insert iLCM{n)=df Replace n^cM{'>^)=df true 




Fig. 10.20. Reestablishing formal procedure calls after the /iCM-transf. 



Note that reestablishing formal procedure calls does neither affect the number 
of computations on a program path from s* to e* nor the lifetimes of tempo- 
raries except for the trivial lifetime ranges which are unavoidably introduced 
for program continuations calling a procedure for which the computation was 
isolated before coalescing the call nodes. Without loss of generality we can 
therefore prove the lifetime optimality of the /LCM-transformation for the 
program before the reconstruction of formal procedure calls. 
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10.5.2 Proving Lifetime Optimality 

In this section we prove that the /L CM -transformation is (computationally 
and) lifetime optimal (cf. /LCM-Optimality Theorem 10.5.1). Central for 
proving this result is the following lemma, whose proof relies again on the 
canonicity of the /iJCM-transformation. 

Lemma 10.5.3 (/LCM-Lemma). 

1. Vn G N* . N-Delayable* (n) I-DSafe{n) 

2. Vp G IP[s*,e*] VL N-Delayable* {pi) < ? < j. 

{pi) E Segments G FU-LtRg {IBCM) 

3. Vp G IP[s*,e*] W Segments (^ij-f{p) G FU-LtRg{IBCM) 

^ < J. FLatest{pi) A (p;) C Segments (^^ j-^{p) 

4- Vp G IP[s*,e*] y Segments (^i j'f{p) G LtRg{IBCM) 

y i Si I < j- FLatest{pi) A (p;) C Segments (^^ j-^{p) 

^ {yi<l'< j. {pn) C Segments (j^ j'^{p)). ^N-Delayable* (pi/) 

5. y ICM G FCMcmpOpt Vn G N'^. Comp jQj^{n) ^ N-Delayable* (n) 

6. Vp G IP[s*,e*] y Segments (^i .j^'^{p) G LtRg {ILCM). 

3p'GlP[s*,e*].p[l,z] =p'[l,z] A 3 j>z. Segments (^^^'^{p') € LtRg {ILCM) 



Proof. Part 1). Let n G N* satisfying the predicate N-Delayable* . Then 
we have 



(Def. 10.5.1(1)) 
(Def. InsertiBcm) 
(p. G N* \n;) 
(Def. 10.3. l(2b)) 



N-Delayable* (n) 

Vp G IP[s*,n] 3 1 < z < Ap. Insert iBCM{Pi) A 
SameInCp[i, Xp[ A ^Comp*(^p_j)(p[z, Ap[) 

Vp G IP[s*, n] 3 1 < z < Ap. I-DSafe {pi) A 
Samelncp[i,\p[ A ^Comp*(^p_j)(p[z, Ap[) 

Vp G IP[s*, n] 3 1 < z < Ap. N-DSafe* {pi) A 
SameInCp[iT\p[ A ^Comp*(^p_j)(p[z, Ap[) 
N-DSafe* (n) 

I-DSafe (n) 



The last implication needs some explanation. If zz G N*\N*, it is triv- 
ially satisfied. Otherwise, i.e., if n G N*, it follows from the validity of 
N-DSafe* (n) and N-Delayable* (zz), and the definition of I-DSafe. 

Part 2). Let p G IP[s*,e*], and let I be an index with N-Delayable* {pi). 
Then Definition 10.5.1(1) guarantees the existence of an index z with 

Insert iBCM{Pi) A SameInCp[i,l[ A ^Cozzzp*(^p j)(p[z,^[) 



(10.26) 
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By means of (10.26) the /iJCM-Lemma 10.4.2(3) guarantees that there is an 
index j with j > I and 



(pi) E Segments € FU-LtRg (IBCM) 



which completes the proof of part 2. 

Part 3). Let p G IP[s*,e*] and Segments j-^{p) G FU-LtRg (IBCM). This 
implies 



Insert ibcm{Pi) A Replace jBc^iPj) 



Additionally, Definition 10.5.1(1) and Replace jgQj^{pj) yield 



N-Delayable* (pi) 



and 

respectively. Thus, 



Comp {pj ) 



l'=df Max{{ I \ i < I < j. N-Delayable* (_p/)}) 

is well-defined. Moreover, by means of Definition 10.5.1(1) it can easily be 
shown 

pn G N* \7V; 

If I' = j, Definition 10.5.2 directly yields 



I-Latest{pn) A (pr) C Segments 

Otherwise, i.e., if I' <j, I-Latest{pii) is proved by a simple case analysis on 
the type of node pn using Definition 10.5.1(1), Definition 10.5.2, and the 
maximality of V . 

Part 4). Let p G IP[s*,e*], Segments G LtRg{IBCM), and I be an 
index with I-Latest{pi) and {pi) C Segments -^{p) . In particular, this im- 
plies Pi G N* \N*. Without loss of generality we can assume I < j. Definition 
10.3.6 then yields 



{yi<l' < j. ipv) C Segments (^^ j){p)). ^Insert ibcm{pi>) 



Let now l>l with 



(Pi) E Segments A yi<l" <1. {pi") E Segments 
Then we obtain by means of Definition 10.5.1(1) and Definition 10.5.2 

-^N-Delayable* {pi) 

A straightforward induction now yields the desired result. 
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Part 5). Let ICM G ICMcmpOpt and n G N"® with 

Comp jcM in) 

Then the following sequence of implications proves part 5: 



(Def. Compicm) 
(Lemma 10.5.1) 



(Def. LtRg{IBCM)) 
(Def. 10.5.1(1)) 



Comp ICM in) 

^ Comp(ji) A ^Replace icmin) V Insert icMin) 

^ VpGlP[s*,e*] V;<Ap.p/ = n 
^ < I < j- ipi) E Segments 

G FU-LtRg{IBCM) 

^ VpGlP[s*,e*] V;<Ap.p/ = n 

<l < j. Insert wcMiPi) A Z[) 

N-Delayable* (ji) 



Part 6). Let p G IP[s*,e*], and let i be an index with Segments ^^{p) G 
LtRg{ILCM). Thus, we have Insert ircMiPi), and therefore 

I-Latest{pi) A ~^I-Isolated{pi) (10.27) 

Suppose now that 

(y p' G IP[s*,e*]. p[l, i] =p'[l, i]) y j>i. Segments ^ LtRg(ILCM) 
Then Definition 10.5.3(2) and Definition 10.5.4 yield 

I- Isolated {pi) 

This, however, is a contradiction to conjunction (10.27), which proves part 6. 

□ 

This suffices to prove the central theorem of this section: 

Theorem 10.5.1 (/LCM-Optimality). 

If the IB CM -transformation is canonic for the program under consideration, 
the ILCM -transformation is computationally and lifetime optimal, i.e., 

IBCM G ICM Can ^ ILCM G ICMnOpt 

Proof. The proof of Theorem 10.5.1 is decomposed into three steps. First, 
proving that the /LCM-transformation is admissible; second, that it is com- 
putationally optimal; and third, that it is lifetime optimal. 

In order to prove ILCM G ICMAdm, it must be shown that all insertions 
and replacements are safe and correct, i.e., it must be shown 

i) InsertiLCM ^ Safe 
a) Replace iccM ^ CorrectjLCM 
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Insert ilcm is defined by 

Vn G N'^. Insert iLCM{n)=df I-Latest{n) A ^I-Isolated{n) 

In order to prove i) we show even stronger 

Vn G N"®. I-Latest{n) S-Safe{n) 

Thus, let n G N"® satisfying the predicate I-Latest. Definition 10.5.2 then 
yields 

N-Delayable* {n) 

Hence, the first part of the /LCM-Lemma 10.5.3 delivers I-DSafe{n), which 
implies N-DSafe* (n), and hence, S-Safe{n). By means of Definition 10.3.1, 
we then obtain as desired 

Safe{n) 

In order to prove ii), consider a node n G N'^ satisfying Replace 
Obviously, this guarantees Comp (n) . In order to complete the proof, we must 
investigate the following two cases. 

Case 1. I-Latest {n) 

In this case, we have ^I-Isolated{n). This directly implies Insert ilcm (n). 
Clearly, for all paths p G IP[s*,n] the predicate SameInCp[Xp, Xp[ is satis- 
fied, and therefore Correct ilc M in) holds trivially. 

Case 2. I- Latest (n) 

Obviously, we have CorrectiBCM{n) because the /BCM-transformation is 
admissible. Moreover, we have -^Insert iBCM(n), since Insert iBcmin) would 
directly imply I-Latest {n) in contradiction to the premise of Case 2. Thus, 
we have 

VpGlP[s*, n]3 < Xp.InsertiBCM{Pi)XSameInCp[i, Xp[ATransp*Jp^^-^{p[i, Ap[) 
In particular, 
ip'=df 

Max{{i I i < XpAlnsert ibcm {P i) ASameInCp[i , Xp[ATransp* m(p[aAp[)}) 

is well-defined. Moreover, the IB CM -Lemma, 10.4.2(3) delivers the existence 
of an index jpi with 

ip' C: jp' 

such that 

SegmentS(^i^, G FU-LtRg (IBCM) 

The ILCM -Lemma 10.5.3(3) now yields 

3ip> <l< jp'. I- Latest (pi) A (pi) Q Segments 
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Clearly, we have 



l<Xp 



Thus, by means of Definition 10.5.3(2) and Definition 10.5.4 we obtain 



-^I-Isolated (pi) 



and therefore also 



Hence, we have as desired 



Insert ilcm{pi) 



Insert ilcm{pi) A SameInCp[l, Xp[ A Transp*'^p i~^{p[l,Xp[) 
which proves ii). 

The second step, i.e., proving ILCM G HCMcmpOpt, is a consequence of 
the following sequence of inequations, which guarantees that the ILCM- 
transformation causes at most as many computations on a path p G IP[s*, e*] 
during run-time as the computationally optimal /BCM-transformation (cf. 
Theorem 10.4.2). 



(Def. Compjj^cM) = 
(Def. ILCM ) = 



< 

(Lem. 10.3.4, 10.5.3(2,4)) = 

(Canon. & Lem. 10.3.4) = 

(Def. IBCM ) = 



|{t| Comp jj^cM{Pi)}\ 

I {i I Insert ILCM (Pt)} \ + 

I {i I Comp {pi) A ^ReplaceILCM{P^)} I 
I {i I I-Latest{pi) A ^I-Isolated{pi)} \ + 

I {i I Comp (pi) A I-Latest (pi) A 

I-Isolated (pi)} \ 

I {i I I-Latest (pi)} | 

I { 1 1 SegmentS(^i j'f(p) G FU-LtRg (IBCM) } \ 
I {i I Insert ibcm{Pi)} \ 

|{i| Comp iBCM{Pi)}\ 



In order to prove the third step, i.e., in order to prove ILCM G JCMuOpt, 
it must be shown 



V/CM G ICMcmpOpt VP G LtRg{ILCM) 3Q G LtRg{ICM). PCQ 

Thus, let ICM G BCMcmpOpt, P G IP[s*,e*], and P=df Segments G 
LtRg{ILCM). Obviously, we have 

-^I-Isolated (pi) 

Thus, if i = j, Lemma 10.5.3(6) yields the existence of a path p' G IP[s*,e*] 
with p[l, i] =p'[l, i] and of an index I with j < I, and 

Segments C Segments (^i i-^{p') G LtRg(ILCM) 
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Without loss of generality we can thus assume that 



Then we obtain as desired 

(Lemma 10.5.2) 
(Lemma 10.5.3(4)) 
{i<j & Lemma 10.5.3(5)) 

{ICM G ICMAdn.) 



i<j 

P G LtRg{ILCM) 

3Q £ LtRg{IBCM). P C Q 
{y i <l < j. (pi) E P)- -^N-Delayable* (pi) 
(yi<l < j. (pi) E P)- ^Insert ICM (pi) A 
(^Comp(pi) V Replace ICM (Pi)) 
3Q G LtRg(ICM). PQQ 



□ 

As a corollary of Theorem 10.3.3 and Theorem 10.5.1 we get: 

Corollary 10.5.1. IBCM G ICM can ICM LtOpt = {ILCM} 



10.6 An Example 

In this section we illustrate under the premise of canonicity the power of 
the IBCM- and /LCM-transformation. To this end we consider the program 
n of Figure 10.21, which is complex enough in order to illustrate the es- 
sential features of the two transformations. Synthetic nodes, which are not 
relevant for the transformations because they are not required for inserting 
computations, are omitted in order to keep the example as small as possible. 
Analogously, this holds for the formal procedure call at node 9, which is not 
(explicitly) replaced by the set of ordinary procedure calls it can invoke. 
Note that 77 is in Progji^( 2 ),wgfppi satisfies the sfmr-property. Whereas 
the first proposition is obvious, when inspecting the program of Figure 10.21, 
the second proposition needs some more explanation because 77 is composed 
of statically nested procedures. The point here is that the static predecessor 
of all procedures, which are passed as an argument, is the main procedure 

7Ti. 

Since 77 is of mode depth 2, we know that the set of formally reachable 
procedures of 77 is effectively computable by an algorithm of quadratic time 
complexity [Ar3]. The algorithm yields: 

TTZ(n) = {7ro,7ri,7rii,7riii,7ri2,7ri3} 

Moreover, 77 is free of global procedure parameters. Thus, by means of Corol- 
lary 6.4.1 and Theorem 6.4.3 we know that formal callability and potential 
passability coincide on 77, and can be computed by the algorithm of the HO- 
DFA of Chapter 6, which is also of quadratic time complexity. Parameterized 
with the set of formally reachable procedures of 77, this algorithm yields: 

'P'P y^n{n)(4'ii) = { 7Ti2, ttis } = TC((t)ii) 
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Fig. 10.21. A complex example: the interprocedural argument program 



By means of the function callee the results of the HO-DFA are made avail- 
able for the subsequent IDFAs. This implies that the IDFA-algorithms for 
computing the program properties involved in the IB CM- and ILCM- 
transformation treat the formal procedure call statement at node 9 as a 
higher order branch statement, i.e., it is assumed to nondeterministically call 
the procedures 7 Ti 2 and 7 Ti3. 
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) : b, z 



”■ 111 111 ■ 




I I Down-Safe Earliest 

Fig. 10.22. Interprocedurally down-safe and earliest program points 



After these preliminary considerations, Figure 10.22 shows the result of com- 
puting the sets of interprocedurally down-safe and earliest program points. 
They induce the insertion points of the /iJCM-transformation, whose result 
is shown in Figure 10.23. Note that the flow graph system of Figure 10.23 is 
canonic. Hence, Theorem 10.4.2 is applicable and yields that the program of 
Figure 10.23 is interprocedurally computationally optimal; a fact, which can 
easily be checked by inspecting this program. 
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Fig. 10.23. The result of the /BCM-transformation 



The canonic! ty of the /BCM-transformation implies the computational and 
lifetime optimality of the /LCM-transformation. Figure 10.24 and Figure 
10.25 show the results of computing the sets of interprocedurally delayable 
and latest, and of latest and isolated program points, respectively. Analo- 
gously to the intraprocedural setting, the latest and isolated program points 
induce the computation points of the interprocedural version of the lazy code 
motion transformation, the /LCM-transformation. Figure 10.26 shows the re- 
sult of this transformation. 
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I I Delayable Latest 

Fig. 10.24. Interprocedurally delayable and latest program points 



Considering this example, the /LCM-transformation is exceptional because 
it eliminates the partially redundant computations of a + b in the nodes 17, 
18, 21, 29, and 30 by moving them to the nodes 16, 28, and 29, but it 
does not touch the computations of a + b in the nodes 7, 8, and 25, which 
cannot be moved with run-time gain. This confirms that computations are 
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only moved when it is profitable. In fact, the flow graph system of Figure 
10.26 is interprocedurally computationally and lifetime optimal. 

Note that no previously proposed transformation for interprocedural par- 
tial redundancy elimination would succeed on this example. 




I I Isolated Latest 



Fig. 10.25. Interprocedurally latest and isolated program points 










11. Optimal Interprocedural Code Motion: 
The IDFA-Algorithms 



In this chapter we present the interprocedural counterparts of the DFA-algor- 
ithms of Chapter 4. For every program property involved in the IBCM- 
transformation and /LCM-transformation we specify an IDFA-algorithm and 
prove it to be precise for the respective property. The specifications and the 
proofs of being precise follow the cookbook style of Section 9.2. Hence, the 
specifications consist of a lattice of data flow information, a local semantic 
functional, a return functional, a start information, and an interpretation of 
the lattice elements in the set of Boolean truth values. The proofs of being 
precise consist of four steps. Proving that the function lattice satisfies the 
descending chain condition, that the local semantic functions and the return 
functions are distributive, and finally, that the interprocedural meet over 
all paths solution is precise for the program property under consideration. 
All proofs are given in full detail in order to illustrate the similarities to 
their intraprocedural counterparts. For convenience, we identify throughout 
Chapter 11 the specification of an ID FA with the algorithm it induces (cf. 
Definition 8.6.1). 



11.1 IDFA- Algorithm A.ds- Interprocedural Down-Safety 

In this section we present the IDFA-algorithm Ads for computing the set 
of interprocedurally down-safe program points.^ The main result applying to 
this algorithm is the Ads-Precision Theorem 11.1.2. It guarantees that Ads is 
precise for this property: it terminates with the set of program points, which 
are interprocedurally down-safe in the sense of Definition 10.4.1. 

We recall that the computation of interprocedurally down-safe program 
points requires a backward analysis of S (cf. Section 8.7 and Section 9.2.1). 
Thus, the roles of call nodes and return nodes in the definitions of the local 
semantic functions (cf. Section 11.1.1) and the return functions (cf. Section 
11.1.1) are interchanged. Similarly, the start information is attached to the 
end node of the program (cf. Section 11.1.1). 



^ The index ds stands for down-safety. 



J. Knoop: Optimal Interprocedural Program Optimization, LNCS 1428, pp. 209-248, 1998. 
© Springer- Verlag Berlin Heidelberg 1998 
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11.1.1 Specification 

Data Flow Information. The domain of data flow information of Ads is 
given by the lattice 

(C, n, C, _L, T)=df {B X Af, Min, <, {false, 0), {true, oo) ) 

We recall that B denotes the set of Boolean truth values {true, false} with 
false < true. New is the second component Af. It denotes the set of natu- 
ral numbers enriched by a new element oo, which is assumed to be the top 
element of Af with respect to the minimum function. The minimum func- 
tion Min and the relation < are defined pointwise, where Min and < are 
generalized to B in the natural way. 

The structure of this lattice is more complicated than that of its intrapro- 
cedural counterpart. This is necessary for being able of dealing with different 
incarnations of a term during the analysis. Recall that in the intraprocedural 
setting there exists only a single incarnation of each program term. Inter- 
procedurally, however, there are potentially infinitely many incarnations of 
a term in the presence of recursive procedures with local variables. This re- 
quires the safety analysis to distinguish between modifications of global and 
local variables. Technically, this is achieved by considering the product lattice 
defined above as the domain of data-flow information. The first component 
of a data-flow information attached to a program point expresses, whether 
a placement of t is interprocedurally down-safe at this point. The second 
component keeps track on the static level of variables of t, which have been 
modified in the current procedure call. Whenever an operand of t is modified 
by a statement, and the static level this operand is declared on is lower than 
that of previously modified operands of t, the second component is updated 
accordingly, i.e., with the level this operand is declared on. Intuitively, this 
means that the second component always stores the static level of the “most 
globally” declared variable which was modified in the most recent procedure 
call. The special element oo expresses that in the call under consideration 
no operands of t have been modified. 

For the effectivity of the analysis, it is important that for a fixed pro- 
gram U G Prog only a finite subset of Af is relevant for Ads, namely 
Af n=df {0, . . . , Max{{ StatLevel{Tr) | tt G 77 }) } U {oo}, where Max denotes 
the maximum function. The finiteness of this subset guarantees the termina- 
tion of Ads (cf. Section 11.1.2). 

Local Semantic Functional. The local semantic functional | : Af* 

{B X Af ^ B X Af) of Ads is defined by 

VnG N* y{b,z) eBxJ^. lnj^^{b,z)=df{b',z') 
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where 



b'=df Comp (n) V ( Transp (n) A b) 



~-df 



Min{{z, ModLev{n)}) if n € N*\N* 
oo otherwise 



Note that the functions | n n G N* , manipulate the two components of 
an argument independently of each other. In particular, the first component 
is treated as by the corresponding intraprocedural function. The treatment of 
the second component reflects the bookkeeping on the static levels of variables 
of t, which have been modified. 

Return Functional. The return functional TZds ■ Nc ^ ^ ^ S x 

Af) of Ads is defined by 

VnG iV:V((6i,zi), (62,22)) G {BxAfY.TZds{n){{bi,zi), (62, 22) )=d/ (6', 2') 
where 

( 62 if Var(t) C GlobVar{fg{succ* {n))) 

b'=df < Comp (n) V 

[ (61 A (22 > StatLevel{fg{succ*{n))))) otherwise 
C=df Min{{zi, Z2}) 

Intuitively, if all variables of t are global with respect to the called proce- 
dure, entering a procedure is essentially the same as traversing an ordinary 
edge, i.e., t is down-safe before the call (i.e., b' = true), if it is down-safe 
immediately before entering the procedure (i.e., 62 = true). Otherwise, i.e., if 
t contains local variables of the called procedure, it is down-safe before the 
call, if it occurs as an actual parameter (i.e.. Comp (n)), or if it is down-safe 
after the call (i.e., b\ = true), and none of its operands, which are global for 
the procedure called are modified by it (i.e., 22 > StatLevel{fg{succ* {n)))). 

Start Information. The start information of Ads is given by the element 

{false, 00) G B X Af 

Recall that down-safety requires a backward analysis. Thus, the start infor- 
mation is attached to the end node of S. Intuitively, the first component of 
the start information expresses that t cannot be used after the termination 
of the argument program. Similarly, the second component expresses that no 
variables of t can be modified after the termination. 

Interpretation. The interpretation of lattice elements in B is given by the 
function Intds ■ B x Af ^ B defined by 



W{b,z) e B X Af. Intds{b, z)=df b 
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11.1.2 Proving Precision 

We first present two lemmas which simplify the proof of the c?s-Precision 
Theorem 11.1.1. The first one is essentially a consequence of Lemma 7.2.1, 
which guarantees that the intervals of call nodes and return nodes on a pro- 
gram path are either disjoint or one is included in the other. This lemma can 
be proved by a straightforward induction on the length of program paths. 
The second lemma is the interprocedural version of Lemma 4.1.3. It follows 
from the definitions of the local semantic functional | and the return 
functional TZds- 

Lemma 11.1.1. Let p G IP[s*,e*], and i,j G {l,...,Ap} such that pi G 
N* and pj G N* are a pair of matching call and return nodes of p. We 
have: 

Transp*\^,_){p[i,j]) 

top{lp]i,Xp]Tds {newstack{false,oo)))l 2 > StatLevel{fg{succ*{pi))) 



Lemma 11.1.2. 

1. Vn G N* \Nf \/stk G STACK. 

a) Comp{n) top {{ n }'^g{stk)) h = true 

b) ^Comp{n) => {top{\n\*j^^{stk))[i = true 

Transp{n) A top{stk)h = true) 

2. \/n& Nf Wstk G STACK >2. 

a) Comp{n) => top {{ n }’^g{stk)) h = true 

b) ^Comp{n) {top{\n\*^^{stk))[x = true 
top{stk)h = true if Varft) C GlobVar{fg{succ*{n))) 
top {pop {stk))h = true A 

top{stk)l 2 > StatLevel{fg{succ*{n))) otherwise 

Descending Chain Condition. Note that neither the lattice BxAf nor its 
induced function lattice satisfies the descending chain condition. Fortunately, 
however, only their finite sublattices BxAfn and [{BxJCn) ^ (BxN'n)] are 
relevant for Ads (cf. Section 11.1.1). This is important because the finiteness 
of a lattice carries over to the corresponding function lattice. Thus, we have 
the following lemma, which is sufficient for our application. 

Lemma 11.1.3 (Descending Chain Condition). 

The function lattice [ {B x N n) ^ {B x N n) ] satisfies the descending chain 
condition. 
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Distributivity. We have: 

Lemma 11.1.4 (| ]^^,7?.ds-Distributivity). 

1. The local semantic functions n€ N* , are distributive. 

2. The return functions TZds{n), n G Nf, are distributive. 



Proof. The first part of Lemma 11.1.4 is essentially a consequence of 
its intraprocedural counterpart Lemma 4.1.5 because the functions 
n G N* , treat the components of an argument independently of each other. 
Hence, it suffices to show that the local semantic functions are distributive 
in their second components, a fact, which follows immediately from the dis- 
tributivity of Min. Thus, we are left with the second part of Lemma 11.1.4. 
Abbreviating elements {{bi,zi), ( 62 , 22 )) of {BxMY by s, we have to show 

yn&NfySC{BxNf.nd.s{n){US) = 

n { 'Rds{n){ (61, 2i), (62, 22) ) I S G S' } 



To this end let n G Nf, and S C (B x Af)^. Then Equation 11.1 is proved 
by investigating two cases. 

Case 1. Varff) C GlobVar{fg{succ* {n))) 

In this case we obtain the following sequence of equalities proving Case 1 . 



(Def. 


“meet” in 


{B X Nf) 


(Def. 


“meet” in 


BxN) 


(Def. 


Tlds(n)) 




(Def. 


“meet” in 


N) 


(Def. 


“meet” in 


BxN) 


(Def. 


Tlds{n)) 





Tlds{n){Min{S)) 

TZds{n){Min{{{ {bi, zi), (62,22)) |s G S})) 
'Rds{n){ (Mm({(6i,2i) |s G S}), 
Mm({(62,22) js G S}))) 

'R-ds{n){ 

( {Min{{b\ I s G S}), Min{{zi \ s G S})), 
{Min{{h2 I s G S}), Min{{z2 \ s G S})) ) ) 

( Min({b2 I s G S}), 

Min{{Min{{zi |s G S}), 

Min({z 2 |s G S})})) 

(Mm({62 I s G S}), 

Min{{Min{{zi, Z 2 } \ s G S})})) 

Min({{b 2 , Min{{zi, Z2})) |s G S}) 
M^n({7^ds(n)( (61, zi), (62, 22) ) | s G S}) 



Case 2. Var{t) % GlobVar{fg{succ*{n))) 
Abbreviating StatLevel{fg{succ* {n))) by I we obtain: 
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(Def. “meet” in {B x Af)'^) 
(Def. “meet” in x Af) 

(Def. UM) 

(Def. “meet” in B) 

(Def. “meet” in Af) 

(Def. “meet” in S x Af) 
(Def. UM) 



TZds{n){Min{S)) 

TZdsin){Min{{{ {bi, zi), (62,22)) |s G S})) 
TZds{n){ {Min{{{bi, zi) |s G S'}), 
Min{{{b 2 ,Z 2 ) js G S}))) 

'R-ds{n){ 

( {Min{{bi I s G Sj), Min{{zi \ s G S})), 
{Min{{b2 I s G S}), Min{{z2 \ s G S})) ) ) 

( Comp (n) V {Min {{bi \ s G S}) A 
Min{{z 2 I s G Sj) > 1), 
Min{{Min{{zi |s G Sj), 

Minl{z2 |s G S})})) 

{Comp {n) V Min{{Min{{bi \ s G Sj), 
Min{{z 2 I s G Sj) > 1}), 

Min{{Min{{zi, Z2 \ s G S})})) 

{Min{{ Comp (n) V (61 A (22 > 0 ) | s G Sj), 
Min{{Min{{zi, Z2 \ s G S})})) 
Min{{{Comp (n) V Mm({ 6 i, 22 > 1 }), 

Min{{zi, Z2})) |s G Sj) 
Min{{TZds{n){ (&i, 21), (62, 22) ) | s G Sj) 



This proves Case 2, and completes the proof of the second part of Lemma 
11.1.4. □ 



ds-Precision. The last step, which is necessary in order to verify the preci- 
sion of Ads is to prove that it is ds-precise. This means that interprocedural 
down-safety in the sense of Definition 10.4.1 coincides with the interproce- 
dural meet over all paths solution of Ads ■ Without loss of generality, we will 
only prove the first part of this theorem because it is the relevant one for 
the definition of the /iJCM-transformation. The second part can be proved 
analogously. 

Theorem 11.1.1 (ds-Precision). 

For all nodes n G N* we have: 

1. N-DSafe* {n) if and only if Intds{X-IMOP(ii*^^(^faise,oo)){n)) 

2. X-DSafe* {n) if and only if Intds{N-IMOP(^iY^^^(^faise,oo)){n)) 

Proof. As mentioned above, we will only prove the first part of Theorem 
11.1.1. For convenience, we use X-IMOP and stko as abbreviations of Intds° 
X-IMOP (^^Yj^^,(pise,oo)) and newstack{false,oo) throughout the proof. 

The first implication, “=J>”, 

Vn G N* . N-DSafe* {n) ^ X-IMOP{n) 




11.1 IDFA- Algorithm Ada 215 



is proved by showing the even stronger implication 
Vp e IP[s*, e*] Vz < Xp. pi = n ^ 

(3z < j < Xp. Comp* (^p^i){pj) A 7ronsp*^p ^)(p[z,j[) (ix.2) 

^ top(|p[z,Ap]]^^(stA:o))ii = ^™e) 

We prove this implication simultaneously for all nodes n G N* by induction 
on the length k=df {Xp — z + 1) of the postfix-paths p[z, Ap] for all paths 
p G IP [s*, e*] and indexes z G {1, . . . , Ap} satisfying 

p^ = n A 3i<j< Xp. Comp* f^pi){pj) A Tronsp*^^ _,)(p[z, j[) (11.3) 

Obviously, the case fc = 0 does not occur, since we are dealing with inter- 
procedural entry-down-safety. Thus, let p[z, Ap] be a postfix-path of length 
k = l. This implies p[z, Ap] = (pi), pi = e*, and therefore pi G N*\N*. In 
this case, (11.3) yields 

Comp* (p,){pi) 

Applying Lemma 10.2.1(1) this is equivalent to 

Comp {pi) 

Hence, Lemma 11.1.2(la) yields as desired 

top{{p[i, Ap] = top{[pi = true 

In order to prove the induction step, let fc > 1, and assume that (11.2) holds 
for all postfix-paths (?[z, Aq] with A^[i < fc, i.e., 

(IH) (Vg G IP[s*,e*]. 1 < A,[i.A,] <fc)- q^=n^ 

{3i<j < Xq. Comp* (qi){qq) A Transp*'lqi){q[i,j\} 
top{{ q[i, Xq] lds(s^fco))ii = true ) 

It is sufficient to show that for every path p G IP[s*,e*] with Pi = n and 
Ap[i,Ap]=^ satisfying (11.3) holds 

top(|p[z, Ap] ]l(stA:o))ii = true 

Without loss of generality we can therefore assume that there is such 
a postfix-path p[z,Ap], which can be rewritten as p[z, Ap] = (pi);p' with 
p' = p]z, Ap]. Next, we must investigate two cases depending on the type of 
node Pi. 

Case 1. Pi G N*\N* 

If Pi satisfies the predicate Comp, Lemma 11.1.2(la) yields as desired 
top{[p[i, Xp] ]L(s^^o))ii = fop(|p* IdsilP' lL(s^^o)))ii = true 
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If Pi does not satisfy the predicate Comp, (11.3) guarantees the existence 
of an index j with z + 1 < j < Ap such that 

Comp\p^i){pj) (11.4) 

and 

Transp*'(pi-^{p[i,j[) (11.5) 

are valid. Together with Lemma 10.2.4(1), (11.5) delivers 

Transp{pi) (H-6) 

Moreover, due to (11.6) and Lemma 10.2.5(2), the first and third part of 
Lemma 10.2.3 yield 

RhsLeVp‘^^^*\i) = RhsLeVp°‘^^*\i + 1) (H-f) 

Combining (11.4), (11.5), and (11.7), we obtain 

C'omp*(p_,+i)(pj) A Transp*'(p^i+i){p]i,j[) 

Hence, the induction hypothesis (IH) yields 

top{lp'Tasistko))li = true ( 11 . 8 ) 

Combining (11.6) and (11.8), we obtain by means of Lemma 11.1.2(lb) 

top(lp[i, Ap] ]l(stfco))ii = topilp, ]rf^(|p' lL(s^^o)))ii = true 

which completes the proof of Case 1. 

Case 2 . rii G N* 

Similar to Case 1 , we first assume that pi satisfies the predicate Comp . In 
this case the desired sequence of equations 

top{[p[i, Ap] ll(stfco))ii = top{\p^ ]rf^(|p' lL(stA:o)))ii = true 

is an immediate consequence of Lemma 11.1.2(2a). Thus, we are left with the 
case that pi does not satisfy the predicate Comp . In this case we have to 
distinguish, whether t contains local variables of fg{succ* {pi)) or not. This 
leads us to the following case analysis. 

Case 2 . 1 . Var(t) C ClobVar{fg{succ* (pi))) 

In this case. Lemma 10.2.3(2) yields 

RhsLev'^°‘'^^*\i) = RhsLeVp‘^^^*\i + 1) (H-9) 

Moreover, as in the previous case, (11.3) guarantees the existence of an index 
j with z + 1 < j < Ap such that the conjunction 

Comp\p,-){pj) A Transp*Jp^i){p[i,j[) 



( 11 . 10 ) 
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holds. From (11.9) and (11.10) we obtain 

C'omp*(p_,+i)(pj) A Transp*'(p^i+i){p]i,j[) 

The induction hypothesis (IH) now yields 

top{lp'fds{stko))li = true (H-H) 

By means of (11.11), Lemma 11. 1.2 (2b) yields now as desired 

top(lp[i, Ap] ]l(stfco))ii = topilp^ ]rf^(|p' lL(s^^o)))ii = true 

which completes the proof of Case 2.1. 

Case 2.2. Var{t) % GlobVar{fg{succ* {pi))) 

In this case, we obviously have 

Var{t) n LocVar{fg{succ* {pi))) yf 0 (H-12) 

Let now be the matching return node of pi on p, i.e., pi G N*, pii G N*, 
and p]i,i'[ € CIP [succ* {pi),pred* {pu)]. Because of (11.12), Lemma 10.2.7 
delivers 

-^Comp*fpi){p\i,i']) (11.13) 

Moreover, by means of (11.13) and (11.3) we obtain the existence of an index 
j with / + 1 < j < Ap such that 

Comp\p,_){pj) A Transp*'lp^i){p[i,j\} (H-14) 

holds. Additionally, Lemma 10.2.6(2) yields 

RhsLev^‘^^<-*^{i) = RhsLev^^^^*\i' + 1) (11.15) 

Together (11.14) and (11.15) imply 

C'omp*(p_,,+i)(py) A Transp*'lpy^^^{p]i',j[) (11.16) 

Abbreviating the path p]i' , Ap] by p", the induction hypothesis (IH) yields 

top(|p" ]Kst/co))ii = true (H-17) 

This implies 

top(pop(|p' Irf^(stfco))) = top(|p" Tdaistko)) (11.18) 

since intervals of call nodes and return nodes on interprocedural paths are 
either disjoint or one is included in the other (cf. Lemma 7.2.1). Combining 
(11.17) and (11.18) we get 

<op(pop(|p' lds(sf^o)))ii = true 



(11.19) 
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Moreover, by means of (11.14), Lemma 11.1.1 yields 

topilp' Tds(stko))l 2 > StatLevel{fg{succ* ipi))) (11.20) 

Now, the desired sequence of equalities 

top{[p[i, Ap] ll(stfco))ii = top{\p^ lL(stA:o)))ii = true 

follows from (11.19), (11.20), and Lemma 11.1.2(2b). This completes the proof 
of the first implication. 

The second implication, 

Vn G N* . X-IMOP{n) ^ N-DSafe* (n) 

is proved by showing the even stronger implication 

Vp G IP[s*,e*] Vi < Ap. p* = n ^ (top(|p[i, Ap] H(sifco))ii = 

^ 3i < j < Ap. Comp* f^p,-){pj) A Tronsp*^p_,)(p[i, j[) 

We prove this implication simultaneously for all nodes n G N* by induction 
on the length k=df (Ap — i + 1) of the postfix-paths p[i, Ap] for all paths 
p G IP[s*, e*] and indexes i G {1, . . . , Ap} satisfying 

'p^ = n K top(lp[i, Xp\j*j^^{stko))h = true (11.22) 

Like in the proof of the first implication, the case k = 0 does not oc- 
cur, since we are dealing with interprocedural entry-down-safety. Thus, we 
start with considering a postfix-path p[i,Ap] of length k=l. This implies 
p[i, Ap] = (pi) = (e*), and therefore pi G N* \N*. Moreover, (11.22) yields 

top(lp[i, Ap] Idds^^o))}! = top{{pi Irfds^^o))}! = true 
Additionally, we have 

top{stko)h = false 
Hence, by means of Lemma 11.1.2(1) we obtain 

Comp {pi) 

According to Lemma 10.2.1(1) this is equivalent to 

Comp* (^pi){pi) 

Thus, the induction basis follows for j = i. 

In order to prove the induction step, let fc > 1, and assume that (11.21) holds 
for all postfix-paths q[i,\f\ with < fc, i.e.. 
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(IH) (Vg G IP[s*,e*]. 1 < <k). q^=n^ 

( topil q[i, Ag] ]^(sifco))ii = true ^ 

< J < \q. Comp\qi){qj) A Transp*'lq^i){q[i, j[)) 

It is sufficient to show that for every path p G IP[s*,e*] with Pi = n and 
Ap[i,Ap] satisfying (11.22) holds 

< j < Ap. Comp\^ i-^{pj) A Transp*\p^^){p[i, j[) 

Thus, without loss of generality we can assume that there is such a postfix- 
path p[i, Ap], which then can be rewritten as p[i, Ap] = {pi);p' with p' =p]i, Ap]. 
Similar to the proof of the first implication we have to investigate two cases 
depending on the type of node pi. 

Case 1. piGN*\N* 

If Pi satisfies the predicate Comp , the induction step follows immediately for 
j = i. Thus, we are left with the case that pi does not satisfy the predicate 
Comp. In this case Lemma 11.1.2(lb) delivers 

Transp{pi) A top{{p' \*da{stko))[i = true (11.23) 

Applying the induction hypothesis (IH) to p' we obtain the existence of an 
index jpi with i + 1 < jp' < Ap and 

Comp\p^i+^){pj^,) A Transp*\p^,a^^){p\i,jp,[) (11.24) 

Moreover, due to Transp (pi), the first and third part of Lemma 10.2.3 imply 

RhsLev^’^''^*\i) = RhsLevl^^^*\i + 1) (11.25) 

Combining (11.23), (11.24), (11.25), we obtain 

Comp* (pi){pj^,) A Transp*\p^^)(p[i,jp,[) 

Hence, the induction step follows for j = jpi . 

Case 2. pi G N* 

Similar to Case 1, if pi satisfies the predicate Comp , the induction step 
follows trivially for j = i. If pi does not satisfy the predicate Comp , we have 
to distinguish, whether t contains local variables of fg (succ* (pi)) or not. 
This leads us to the following case analysis. 

Case 2.1. Var{t) C ClobVar{fg{succ* (pt))) 

In this case, we obtain by means of Lemma 11.1.2(2b) 

top{lp'fds{stko))li = true (11.26) 

and additionally by means of Lemma 10.2.5(1) 



Transp (pi) 




220 11. Interprocedural Code Motion 

According to Lemma 10.2.1(2) Transp(j>i) implies 



Transp* (11.27) 

By means of (11.26) and the induction hypothesis (IH) we therefore obtain 
the existence of an index jp/ with i + 1 < jp' < Xp such that 

Comp\p^i+^){pj^,) A Transp*\p^,^^){p\i,jp,[) (11.28) 

holds. Moreover, Lemma 10.2.3(2) yields 

RhsLev^°‘'^^*\i) = RhsLeVp°‘^^*\i + 1) (11.29) 

Now, (11.27), (11.28) and (11.29) deliver 

Comp* !^pi){pj^,) A Transp*\p^^){p[i,jp,[) 

Thus, the induction step follows for j=jp' in Case 2.1. 

Case 2.2. Var{t) ^ GlobVar{fg{succ* {pi))) 

Here we have 

Var{t) n LocVar{fg{succ* (pi))) yf 0 

Let pii be the matching return node of pi on p, i.e., pi G N*, pii G N*, 
and p]i,i'[ G ClP[succ*{pi),pred*{pi')]. As a consequence of 

top(|p[f, Xp] ld^(stfco))ii = top(lp, ]rf^(|p' lL(s^^o)))ii = true 
Lemma 11.1.2(2b) yields 

top{pop(lp' ll(stA:o)))ii = true (11.30) 

and 

topilp' Tds(stko))l 2 > StatLevel{fg{succ* ipi))) (11.31) 

Since intervals of call nodes and return nodes on interprocedural paths are 
either disjoint or one is included in the other (cf. Lemma 7.2.1), we obtain 
by means of (11.30) 

top{lp” Tdsistko))li = true (11.32) 

where p”=df p]i' , Xp]. The induction hypothesis (IH), therefore, yields the 
existence of an index jp// with z' + 1 < jp// < Xp and 

C'omp*(p_,/+i)(py^„) A Tronsp*^p_,,+i)(p]t', jp»[) (11.33) 

Moreover, (11.31) and Lemma 11.1.1 deliver 

Transp*'(p^^^{p[i,i']) (11.34) 

and Lemma 10.2.6(2) yields 

RhsLev^’*''^^^(i) = RhsLev^^^^*\i' + 1 ) 



(11.35) 
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Combining (11.33), (11.34), and (11.35), we now obtain 

Comp * A Transp*'\p^i){p[i,jpn\) 

Thus, the induction step follows for j = jpff . This completes the proof of the 
second implication, and finishes the proof of the relevant part of Theorem 
11 . 1 . 1 . □ 

Combining Lemma 11.1.3, Lemma 11.1.4, and Theorem 11.1.1, we obtain the 
central result of this section: Ads is precise for interprocedural down-safety. 
This guarantees that the /MFP-solution computed by Ads coincides with 
the set of all program points which are interprocedurally down-safe in the 
sense of Definition 10.4.1. 

Theorem 11.1.2 (.4ds-Precision). 

Ads is precise for interprocedural down-safety, i.e., Ads is terminating and 
ds -precise. 



11.2 IDFA-Algorithm A-ea- Interprocedural Earliestness 

In this section we present the IDFA-algorithm Aea for computing the set 
of interprocedurally earliest program points.^ The main result of this sec- 
tion, the Aea-Precision Theorem 11.2.2, guarantees that Aea is precise for 
this property: it terminates with the set of all program points, which are 
interprocedurally earliest in the sense of Definition 10.4.2. 

11.2.1 Specification 

Data Flow Information. The domain of data flow information of Aea is 
given by the product lattice 

(C, n, C, T, T)=df {B X Af, {Max, Min), (>, <), {true, 0), {false, oo) ) 

where Max denotes the maximum function, which on B corresponds to 
the logical disjunction. In comparison to the down-safety analysis we use a 
slightly different lattice for the earliestness analysis. This is caused by the fact 
that the algorithms of Chapter 8.5 are tailored for computing the greatest 
solution of an equation system, whereas the straightforward specification of 
the earliestness analysis requires the computation of the least solution. 

Intuitively, the first component of a data flow information, which is at- 
tached to a program point indicates, whether a placement of t is earliest at 
this point. The second component has the same meaning as in the down-safety 
analysis. It stores the static level of the “most globally” declared variable of 

^ The index ea stands for earliestness. 
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t, which has been modified in the current procedure call. Hence, like for the 
down-safety analysis only the finite sublattice B x Af n oi B x Af is relevant 
for Aea- 

Local Semantic Functional. The local semantic functional | : N* 

{B X Af ^ B X Af) of Aea is defined by 

Vne N* y{b,z)GBxA^.ln z)=df {b', z') 

where 

b'=df ^ Transp (n) V {^I-DSafe (n) A b) 



~-df 



Min{{z , ModLev{n)}) if n G N*\N* 
oo otherwise 



Return Functional. The return functional TZea ■ Nf ^ ^ B x 

AA) of Aea is defined by 



VnG lV;V((6i,zi), (62,22)) G {BxAA)^.TZea{n){{buZi), (62, 22) )=d/ (6', 2') 



where 

r 62 A -^I-DSafe{n) if Var(t) C GlobVar{fg{pred* {n))) 

b'=df ^ {^I-DSafe{ns) Abi) V 

[ {z2< StatLevel{fg{pred*{n)))) otherwise 



z'=df Min{{zi,Z2}) 



We remark that the definitions of the functionals | and TZea rely on 
essentially the same intuitions as the definitions of their counterparts of the 
down-safety analysis. 

Start Information. The start information of Aea is given by the element 
(si G Rangeit), 00) G B x Af 

Intuitively, the first component of the start information expresses that t is 
earliest at the beginning of the argument program if and only if the start 
node of II belongs to the range of t. The second component reflects that no 
variable of t is assumed of having been modified before the execution of the 
first statement of II. 

Interpretation. The interpretation of lattice elements in B is given by the 
function Intea ■ B x J\f ^ B defined by 

y {b, z) G B X AA. Intea{b, z)=df b 
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11.2.2 Proving Precision 

Similar to Section 11.1.2, we first present two lemmas, which are useful for 
the proof of the ea-Precision Theorem 11.2.1. Both lemmas can easily be 
proved by means of the definitions of the functionals | and TZea- 

Lemma 11.2.1. Let p G IP[s*,e*], and i,j G {l,...,Ap} such that pi G 
N* and pj G N* are a pair of matching call and return nodes of p. We 
have: 

Transp*\^i){p[i,j]) 

top(|p[l, j[l*^(newstacfc(si G Range{t),oo )))[2 > StatLevel{fg{pred* {pj))) 

Lemma 11.2.2. 

1. Vn G N* \Nf Vstk G STACK. 

a) ^Transp (n) top{ln}l^{stk))ii = true 
h) Transp{n) {top{lnjl^{stk))ii = true 

-^TDSafefn) A top{stk)h = true) 

2. \/n& N; Wstk G STACK >2. 

a) ^Transp (n) {top{ln}*^{stk))ii = true 

( {^I-DSafe {ns) = top{pop{stk))li = true) V 

{top{stk)l 2 < StatLevel{fg{pred*{n )))) ) ) 

b) Transp{n) { top {{nXl^{stk))[x = true 

~^TDSafe{n) A top{stk)h = true) 

It is worth noting that in the second part of Lemma 11.2.2 the premise 
Transp implies that all variables occurring in t are global with respect to 
the procedure fg{pred* {n)). Correspondingly, ^ Transp {n) implies that t 
contains at least one local variable or formal value parameter of fg{pred*{n)). 
In the context of interprocedural paths, ^ Transp {n) implies also that a 
recursive procedure call is finished. 

Descending Chain Condition. Like for Ads, also for Aea only the finite 
sublattices B x Mn and [{B x Mn)^{B x M n)\ oi B x M and [{B x 
Af) {Bx N) ] are relevant. Thus, we immediately get the following lemma, 
which is sufficient for our application. 

Lemma 11.2.3 (Descending Chain Condition). 

The function lattice [ {B x Af n) ^ {B x AT n) ] satisfies the descending chain 
condition. 

Distributivity. The distributivity of the local semantic functions and the 
return functions for earliestness can be proved along the lines of Lemma 
11.1.1. We have: 
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Lemma 11.2.4 (| ]g^,7?,ea-Distributivity). 

1. The local semantic functions nG N* , are distributive. 

2. The return functions TZea{n), n G Nf, are distributive. 

ea-Precision. Finally, we must prove the coincidence of interprocedural ear- 
liestness in the sense of Definition 10.4.2 with the interprocedural meet over 
all paths solution induced by Aea as stated by Theorem 11.2.1. Without loss 
of generality we only prove the first part of this theorem because it is the 
relevant one for the definition of the /iJCM-transformation. We remark that 
the second part can be proved analogously. We have: 

Theorem 11.2.1 (ea-Precision). 

For all nodes n G N* we have: 

1. N- Earliest* (n) if and only if Intea{N-IMOP(^i Yea,{^ieRo-nge{t).oc)){n)) 

2. X- Earliest* (n) if and only if Intea{X-IMOP(i 

Proof. 

As mentioned before, we will only consider the first part of Theorem 11.2.1 in 
detail. For convenience, we abbreviate Intea ° N-IMOP ]*^,(sieKange(i),oo)) 
and newstack{s\ G Range{t),oo) by N-IMOP and stk^ throughout the proof. 



The first implication, “=J>”, 

V n € N* . N-Earliest* (n) N-IMOP{n) 
is proved by equivalently showing 
Vpe IP[s*,n]. (VI < z<Ap. 

TDSafe{pi) A SameInCp[i, Xp[ ^Transp*Jp ^-^{p[i, Xp[) (H-36) 
^ top{lp[l, Xp[Teais^^o))ii = true) 

We prove (11.36) simultaneously for all nodes n G N* by induction on the 
length k of path p[l, Ap[ for all paths p G IP[s*,n] satisfying 

VI < i < Xp.I-ESafe (pi)ASameInCp[i, Xp[^^'H(iRsp*^pj^(p[i, Xp[) (11.37) 

If k = 0, we obtain n = s* and p[l, Ap[ = e. Moreover, combining (11.37) and 
Lemma 10.3.1 we obtain n G Range(t). Hence, the desired sequence of equal- 
ities 



top({p[l, Ap[ lL(stfco))ii = top(l £ l*a(s^^o))ii = stkoil = true 
holds trivially. 

In order to prove the induction step, let fc > 0, and assume that (11.36) 
holds for all paths q with A^ji i.e.. 
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(IH) (VgGlP[s*,n].0<A,[i.A,[<fc). 

(V 1 < i < Aq. I-DSafe {qi) A SameInCq[i, Ag[ ^Transp*J^ i){Q[h ^(j[) ) 
^ top{lq[l,Xq[jl^{stko))ii = true) 

Now it is sufficient to show that for every path p G IP[s*, n] with Ap[i_Ap[ = k 
satisfying (11.37) holds 

top{lp[l, Ap[ ]*„(stfco))ii = true 

Without loss of generality we can assume that there is such a path p, which 
then can be rewritten as p = p';{m);{n) with p'=p[l,k[ and m = pk G 
pred*{n). Next, we must investigate two cases depending on the type of node 

m. 

Case 1. mG N*\N; 

If Transp (m) does not hold, 

top{lp[l, Ap[ ]*„(stfco))ii = topil m Teailp' lL(«^^o)))ii = truc 

follows immediately by means of Lemma 11.2.2(la). Thus, we are left with 
the case that m satisfies the predicate Transp . According to the choice of 
p satisfying (11.37) we directly obtain 

-^TDSafe{m) (11.38) 



Thus, we have 

VI < i <k. TDSafe (pi) A SameInCp[i, Ap[ ^ Transp* 

Applying the induction hypothesis (IH) to p[l, Xp[ = p'-,{m), we get 

topilp' jla_{stko))li = true (11.39) 

Combining (11.38) and (11.39) with Lemma 11.2.2(lb) we obtain as desired 
top(lp[l, Ap[ ]*„(stfco))ii = top(l m 1*„(Ip' lL(s^^o)))ii = true 
Case 2 . m G N* 

If Transp (m) holds, the proof proceeds as in the corresponding situation 
of Case 1. Thus, we can assume that Transp (m) does not hold. Together 
with n G succ*{m) and t G Range{n), this implies that we are finishing a 
recursive procedure call, and that 

Var{t) n LocVar{fg{pred*{m))) yf 0 

Let fh=dfPs be the matching call node of m on p' . We first consider the 
case 

TDSafe (m) 
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By means of Lemma 10.2.6 we get 

SameInCp[s, Ap[ 

According to the choice of p satisfying (11.37), there is an index j with 
Xp>j > s and 

-^Transp* 

Applying Lemma 11.2.1 we obtain 

topilp' ]*„(stfco))i 2 < StatLevd{fg{pred* {m))) (11.40) 



Now 

iop{{p\lS^tko)) ii = true 

follows immediately from (11.40) and Lemma 11.2.2(2a). 

Hence, we are left with the case that I-DSafe (m) does not hold. Without 
loss of generality we can assume 

topilp llaistko))l 2 > StatLevel{fg{pred*{rn))). 

According to the choice of p satisfying (11.37) and due to Lemma 10.2.6(1) 
we therefore have 

VI < s<r. I-DSafe{ps) A SameInCp[s,\p[ ^ ^Transp*^^^pg^{p[s,r\) 
Applying the induction hypothesis (IH) to p" =p[l,r] yields 
top{\p"[l, r[ ]*„(stfco))ii = true 

Hence, the application of Lemma 11.2.2(2a) completes the proof of the first 
implication. 

The second implication, “<J=”, 

Vn G N* . N-IMOP{n) N-Earliest* (n) 

is proved by showing the equivalent formula 

VpGlP[s*,n]. top([p[l,Ap[j*^(stko))h = true ^ (1141) 

(VI < i < Xp.I-DSafe (pi)ASameInCp[i, Transp*^p^^^(p[i, Xp[) ) 

We prove (11.41) simultaneously for all nodes n G N* by induction on the 
length k of path p[l, Ap[ for all paths p G IP[s*,n] satisfying 

top(lp[l, Ap[ ]*„(stfco))ii = true (11.42) 

If fc = 0, we obtain p[l,\p[ = e. Hence, we have 

top{{p[l, Ap[ lL(s^^o))ii = top{{ e lL(st/co))ii = stkoii = true 
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Additionally, we obtain n = s*, and therefore s* € Range{t). Thus, (11.41) 
holds trivially. 

In order to prove the induction step, let k > 0, and assume that (11.41) 
holds for all paths q with Ag[i_A,[<fc, he., 

(IH) VgGlP[s*,n]. 0< A,[i,;,,[<fc). 
top{lq[l,Xq[jl^{stko))h = true 

(VI <i<Xq. I-DSafe{qi) A SameInCq[i^Xq[ ^Transp* fe^)(9[AA,[)) 

It is sufficient to show that for every path p G IP[s*,n] with Xp[i^\^[ = k 
satisfying (11.42) holds 

VI < i<Xp. I-DSafe{pi) A SameInCp[i,Xp\ ^Tronsp*^^ Ap[) 

Thus, we can assume that there is such a path p, which then can be rewritten 
as p = p';{m);{n) with p' =p[l,k[, and m = pk G pred*(ji). Similar to the 
proof of the first implication we must now investigate two cases depending 
on the type of node to. 

Case 1. meN*\N; 

We first assume that Transp{m) does not hold. If to G N*, the induction 
step holds trivially, since there is no r with 1 < r < A: such that the predicate 
SameInCp[r,Xp[ is satisfied. Otherwise, i.e., if to ^ N*, we have 

SameInCp[k, Ap[ 

Thus, for all 1 < r < fc with I-DSafe{pr) and SameInCp[r, Xp[ we also have 

SameInCp[r, Ap[ 

Together with ^Transp (pk) this directly implies for all such r as desired 

-^Transp* (^pp.){pk) 

Hence, we are left with the case that Transp (to) holds. Applying Lemma 
11.2.2(lb) we get 

-^I-DSafe (to) 

and 

iop{\p' lL(s^^o))ii = true 
Moreover, Transp (to) implies 

SameInCp[k, Ap[ 

Hence, we obtain 

TDSafe{r) A SameInCp[r,Xp[ TDSafe{r) A SameInCp[r,k] 1 < r < fc 

The induction step now follows from applying the induction hypothesis (IH) 
to p' . 
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Case 2. m G N* 

If Transp (m) holds, we have 

SameInCp[k, Ap[ 

and by means of Lemma 11.2.2(2b) we get 

^I-DSafe{m) A top(lp' jl^{stko))h = true 

In this case the induction step follows from applying the induction hypothesis 
(IH) to p' as above. Thus, we are left with the case that Transp (to) does 
not hold. Since p is an interprocedurally valid path, there is a call node 
m=dfPs on p which matches the return node m = pk- According to Lemma 
10.2.6(1)&(2) we have 

y s < j < k. ^SameInCp[s, j] 

Additionally, Lemma 10.2.6(2) yields 

Samelnc[s, k[ 

Consequently, for all 1 < j < s with SameInCp [j, Xp [, we have 

SameInCp[j, s[ 

Now we first assume 

top{\p' ]*„(stfco))i 2 < StatLevel{fg{pred* {m))) 

Then, by means of Lemma 11.2.1 there is an index u with s < u < k and 

-^Transp* (p^^){pu) 

In this case the induction step follows for j = u. Otherwise, i.e., whenever 

lL(s^^o))i 2 > StatLevel{fg{pred* {m))) 

holds. Lemma 11.2.2(2a) yields 

^I-DSafe (pAp) = top (Ip" lL(s^^o))ii = true 

where p"=p[l,s]. The application of Lemma 10.2.6 and of the induction 
hypothesis (IH) to p"[l,s[ delivers now the induction step for the last case. 
This finishes the proof of the relevant part of Theorem 11.2.1. □ 

Combining now Lemma 11.2.3, Lemma 11.2.4, and Theorem 11.2.1 we obtain 
the main result of this section: Aea is precise for interprocedural earliestness. 
This guarantees that the /MFP-solution computed by Aea coincides with 
the set of all program points being interprocedurally earliest in the sense of 
Definition 10.4.2. 

Theorem 11.2.2 (Aea-Precision). 

Aea is precise for interprocedural earliestness, i.e., Aea is terminating and 
ea-precise. 
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11.3 IDFA-Algorithm A-di- Interprocedural Delayability 



In this section we present the IDFA-algorithm Adi for computing the set of 
interprocedurally delayable program points.^ The main result of this section, 
the Adf-Precision Theorem 11.3.2, yields that it is precise for this property: 
it terminates with the set of all program points, which are interprocedurally 
delayable in the sense of Definition 10.5.1. 



11.3.1 Specification 



Data Flow Information. The domain of data flow information of Adi is 
given by the lattice 



(c,n, C,T, T)=df {B, A, <, false, true) 



Intuitively, a data flow information attached to a program point expresses, 
whether an insertion of t by the /iJCM-transformation can be delayed to 
this point. In contrast to the down-safety and the earliestness analysis it is 
not necessary to work on a product lattice in order to keep track on the static 
level of modified variables. This information is encoded in the down-safety 
and earliestness information the delayability analysis relies on. 

Local Semantic Functional. The local semantic functional | 1,^; : N* 
{B^B) of Adi is defined by 

VnG N* yb€B. lnj'di{b)=dfb' 



where 



6 '= 



df 



{by Insert iBCM{n)) A ^Comp{n) 

if n G N*\Nf V Var{t) C GlobVar{fg{succ*{n))) 
false otherwise 



The intuition underlying the definition of the local semantic functional is the 
same as in the intraprocedural case. Intuitively, an insertion of the IBCM- 
transformation can be delayed to the exit of a node n, if the term t under 
consideration is not blocked by n (i.e., ~^Comp {n)), and if its insertion can 
be delayed to the entry of n. This holds, if n is an insertion point of the 
/iJCM-transformation (i.e.. Insert ib cm { n)), or if the argument of |n]^j is 
true (i.e., b=true). 



® The index dl stands for delayability. 
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Return Functional. The return functional TZdi ■ N* ^ {B^ ^ B) of Adi 
is defined by 

Vn e iv; V (61,62) G B\ ndi{n){biM)=dfb' 



where 

y_ f 62 if Var(t) C GlobVar{fg{pred*{n))) 

\ (61 V Insert IB CM {ns)) /\ ^Comp (nc) otherwise 



Intuitively, if some of the operands of t are locally declared in the procedure 
called (i.e., Var{t) % GlobVar{fg{pred* (n)))) , then t cannot be used inside 
this procedure as this concerns a different incarnation of t. Thus, if t is 
delayable across the procedure call, then it is delayable in one big step and not 
by moving it stepwise through the called procedure. This, actually, is possible, 
if t is not passed as an actual parameter (i.e., ^Gomp {nc)), and if the 
insertion of t can be delayed to the entry of the call node under consideration 
(i.e., 61 V Insert iBCM{ns))I^ On the other hand, if all variables of t are 
global for the called procedure (i.e., Var(t) C GlobVar{fg{pred*{n)))), t can 
be delayed across the procedure call, if it can successfully stepwise be delayed 
to the end of the procedure called (i.e., b 2 = true). The correctness of this 
definition is a consequence of the down-safety and earliestness information 
encoded in the predicate Insert ibcm ■ 

Start Information. 

Insert ibcm {si) G B 

Intuitively, this choice of the start information reflects that the process of 
moving computations in the direction of the control flow to “later” program 
points is initiated at insertion points of the /iJCM-transformation. 

Interpretation. The interpretation of lattice elements in B is given by the 
identity on B, i.e., the function Intdi : B-^B is given by 

Intdi=df Ids 



11.3.2 Proving Precision 

Like for the down-safety and earliestness analysis, we first introduce two 
lemmas, which simplify the proof of the dLPrecision Theorem 11.3.1. The 
first lemma can be proved straightforward by means of the definitions of 
the local semantic functional and the return functional of Adi- The second 
lemma follows immediately from the definition of interprocedural delayability 
(cf. Definition 10.5.1). 



^ We recall that ns denotes the procedure call node in S corresponding to n. 
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Lemma 11.3.1. 



1. Vn G N* \{N* U N*) ystk G STACK. 

a) Comp{n) top{ln Idiistk)) = false 

b) ^Comp{n) {top{\n\di{stk)) = true 

Insert iBCM{n) V top (stk) = true) 



2. ynGNf y stk G STACK, top {{ n J ^listk)) = true 

Varff) C GlobVar{fg{succ*{n))) A ^Comp{n) A 
{top(stk) = true V Insert iBCM{n)) 

3. y n G Nf y stk G STACK> 2 - {top{lnldi(stk)) = true 

r top{stk) = true if Var{t) C GlobVar{fg{pred* (n))) 

< {top {pop {stk)) = true y 
y Insert iBCM{ns))t\^ Comp {nc) otherwise 



Lemma 11.3.2. VnG N* . N-Delayable* {n) Insert iBCM{n) V 

(VpG IP[s* ,n] 31 < i < Xp. Insert iBCM{Pi) A SameInCp[i, Xp[ A 



Descending Chain Condition. Obviously, B is finite and hence, the cor- 
responding function lattice [B^B] is finite, too. Thus, we have: 

Lemma 11.3.3 (Descending Chain Condition). 

The function lattice [B—^B] satisfies the descending chain condition. 

Distributivity. We have: 

Lemma 11.3.4 (| ]^;,7?.d/-Distributivity). 

1. The local semantic functions nG N* , are distributive. 

2. The return functions lZdi{n), n G Nf, are distributive. 

Proof. The distributivity of the local semantic functions follows immediately 
from the distributivity of the corresponding intraprocedural local semantic 
functions (cf. Lemma 4.3.5). Thus, we are only left with proving the distribu- 
tivity of the return functions. This requires to show: 

ynGNfyBCB^. TZdi{n){l\ B) = f\{ 7^d^(n)(6l, 62 ) | {biM) G B } 

(11.43) 

To this end let n G Nf and B C B^. Two cases must now be investigated in 
order to prove (11.43). 

Case 1. Var{f) C GlobVar{fg{pred* {n))) 

Here we obtained as desired: 

Kdi{n){/\B) 

= 7 ^d^(r^)(/\{ (61,62) I (61,62) GH}) 
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(Def. ofA) = ndi{n){/\{b^\{biM)&B},f\{b2\{biM)^B}) 

(Def. of7^d/) = 

(Def. of7^d/) = f\{B.di{n){hi,h 2 )\{bi,h 2 ) ^ B] 

Case 2. Var{t) % GlobVar{fg{pred* {n))) 

In this case we get the following sequence of equations: 

ndi{n){/\B) 

= 7^d^(n)(/\{ (61,62) I (61,62) GS}) 

(Def. of A) = 7^d^(r^)(A{^ll(^l.^2)GB},/\{62|(6l,62)GB}) 

(Def. of TZdi) = i/\{ 61 I (61, 62) G B } V Insert ibcm (ns)) A ^Comp {nc) 

(Def. of A) = f\{ {h V Insert ibcm {ns)) A ^Comp {nc) \ (61, 62) G B } 

(Def. of 7 ^d/) = I (61,62) G 5 } 

which completes the proof of Lemma 11.3.4. □ 

di-Precision. The remaining step in showing the precision of Adi is to 
prove the coincidence of the interprocedural meet over all paths solution of 
Adi with interprocedural delayability in the sense of Definition 10.5.1, which 
is stated by Theorem 11.3.1. We only prove the first part of this theorem, 
which is the relevant one for the definition of the /LCM-transformation. The 
second part can actually be proved in the very same fashion. We have: 

Theorem 11.3.1 (dZ-Precision). 

For all nodes n G N* we have: 

1. N -Delay able* {n) if and only if Insert iBCM{n) V 

Intdl{N -IFIOP 13id«sertiBCM(si)) (^)) 

2. X- Delay able* {n) if and only if Intdi{X -IMOPt^u*^^jnsertiBCM(si)){n)) 

Proof. As mentioned above, we will only prove the first part of Theorem 
11.3.1. For convenience, we abbreviate IntdioN-IMOP(j ]*,,/nsertiBCM(si)) ^nd 
newstack{InsertiBCM{si)) by N-IMOP and stko throughout the proof. 

The first implication, “=J>”, 

Vn G N* . N-Delayable* (n) Insert iBCM{n) V N-IMOP{n) 

is according to Lemma 11.3.2 equivalent to 
VnG N* . {Insert ibcm {n)V 

(VpGlP[s*,n] 31 <i<Xp. Insert ibcm {Pi) A SameInCp[i, Xp[ A (11.44) 
^Comp*fpi){p[i,Xp[)) ^ InsertiBCM{n) V N-IMOP{n)) 
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Clearly, (11.44) is trivial, if n satisfies the predicate Insert ibcm ■ Thus, in 
order to complete the proof of (11.44) it is sufficient to show 



Vp G IP[s*, n] ( 3 1 < f < Ap. Insert ibcm {Pi) A SameInCp[i, Ap[ A 
^Comp*fp-^{p[i,Xp[)) top{lp[l, Xplfdiistko)) = true 



(11.45) 



which we simultaneously prove for all nodes n G N* by induction on the 
length k of path p[l, Ap[ for all paths p satisfying 



pGlP[s*,n] A {31 < i < Xp. InsertiBCMipi) X SameInCp[i, Xp[ A 

^Comp*fpi){p[i,Xp[)) 



(11.46) 



Obviously, the case fc = 0 does not occur. Thus, we can start with considering 
a path p[l,Ap[ of length k=l. In this case we obtain p[l,Ap[=(pi) and 
Pi = s*. By means of (11.46) we obtain 



Insert ibcm {pi) A SameInCp[pi,pi[ A -^Comp* (^pi^{pi) (11.47) 

Applying Lemma 10.2.1(1), (11.47) is equivalent to 

Insert ibcm {pi) A SameInCp[pi,pi[ A ^Comp{pi) (11.48) 

Hence, by means of (11.48), Lemma 11.3.1(lb) yields as desired 
top{{p[l, Ap[]^;(stA:o)) = top{\pi ld/(stfco)) = true 



In order to prove the induction step, let fc > 1, and assume that (11.45) holds 
for all paths q with A^[i < k, i.e., 

(IH) (Vg G IP[s*,n]. 1 < A,[i,a,[<A:). 

( 3 1 < t < fc. Insert ibcm {qi) A SameInCq[i, k[ A -^Comp* fq,iMxk[)) 
top{lq[l, Xq[j*di{stko)) = true) 

It is sufficient to show that for every path p G IP[s*,n] with Ap[i_;^p[ = fc 
satisfying (11.46) holds 

top{lp[l, Ap[]^,(stfco)) = true (11.49) 

Without loss of generality we can assume that there is such a path p, which 
then can be rewritten as p = p';{m);{n) with p'=p[l,fc[ and m = pk G 
pred*{n). Next we must investigate three cases depending on the type of 
node m. 

Case 1. mG N* \{N* U N*) 

If Insert ibcm {'m) holds, (11.46) implies 



^ Comp* (to) = ^ Comp* (p_fc) (pk) 
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By means of Lemma 10.2.1(1) we therefore have 

^ Comp (to) 

Now, Lemma 11.3.1(lb) yields as desired 

top{lp[l, \p[ Tdiistko)) = topil TO ihilp' Tdiistko))) = true 
If Insert iBCM{rn) does not hold, we obtain by Lemma 10.2.3(1) 
RhsLeVp°’'^^^\k — 1 ) = RhsLev^°'^^^\k) 

Thus, we have 

SameInCp[k — l,k[ 

Additionally, (11.46) guarantees 

31 < i < k — 2. Insert iBCM{Pi) A SameInCp[i, k[ A 

-^Comp*fpi~f{p[i,k- 2[) 

Combining (11.50) and (11.51), we obtain 

31 < i < k — 2. Insert iBcmiPi) A SameInCp[k — 1, fc[ A 
^Comp*fp i){p[i,k- 2[) 



(11.50) 

(11.51) 

(11.52) 



Hence, the induction hypothesis (IH) can be applied to p' delivering 

top{lp'lli(stko)) = true (11.53) 

Now, Lemma 11. 3.1 (lb) yields 

top(lp[l, Xp[ Ihistko)) = topil m Tdiilp' Tdi(stko))) = true 
which completes the proof of Case 1. 

Case 2. to G N* 

The case Var{t) % GlobVar{fg{succ*{m))) cannot occur because of the fol- 
lowing implication: 



V 1 < t < Ap. Insert iBCMiPi) ^SameInCp[i, Ap[ 
Thus, we can assume 



Var{t) C GlobVar{fg{succ*{m))) 

The induction step can now be proved as in Case 1, using Lemma 10.2.3(2) 
and Lemma 11.3.1(2) instead of Lemma 10.2.3(1) and Lemma 11.3.1(lb), 
respectively. 

Case 3. TO G N* 

Clearly, we have 

-^Comp{n) A -^Insert iBCM{m) (11.54) 
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We must now distinguish, whether t contains local variables of fg{pred* (m)) 
or not. If all variables of t are global with respect to fg{pred* (m)), i.e., 
if Var{t) C GlobVar{fg{pred*{m))), the induction step follows as in Case 
1 using Lemma 10.2.3(3) and Lemma 11.3.1(3) instead of Lemma 10.2.3(1) 
and Lemma 11.3. l(lb), respectively. Thus, we are left with the case Var{t) % 
GlobVar{fg{pred* {m))) . Let pw be the matching return node of pk=m on 
p, i.e., pk' G N*, pk G N*, and p]k',k[ G CIP [succ* {pk'),pred* (pk)]- If 
Insert iBCMiPk') holds, the induction step follows immediately by means of 
Lemma 11.3.1(3). Otherwise, i.e., if Insert wCMiPk') does not hold, (11.46) 
implies 

3 1 < i < fc'. Insert iBCM{Pi) A SameInCp[i, k[ A ^Gomp*^p i)(_p[*, k'[) 

(11.55) 

Moreover, Lemma 10.2.6(2) yields 

RhsLev^‘^^^*\k') = RhsLevl^^^*^{Xp) (11.56) 

This implies 

SameInCp[i, k'[ 

and, therefore, (11.55) is equivalent to 

31 < i <k' . Insert iBCM{Pi) A SameInCp[i, k'[ A ^Gomp*^p i)(_p[*, k'[) 

(11.57) 

Abbreviating the path p[l,k'] by p” , the induction hypothesis (IH) delivers 

topilp” ldi(stko)) = true (11.58) 

This implies 

top{pop(lp' ]rf,(stfco))) = top(lp" Tdiistko)) (11.59) 

since intervals of call nodes and return nodes on interprocedural paths are 
either disjoint or one is included in the other (cf. Lemma 7.2.1). Combining 
(11.58) and (11.59) we get 

top{pop(lp' ld/(stfco))) = true (11.60) 

The induction step 

top(lp[l, Xp[ jdi(stko)) = top(l m fdiilp' ld/(stfco))) = true 
now follows from (11.54) and (11.60) by means of Lemma 11.3.1(3). 

The second implication, “<J=”, 

Vn G N* . Insert jBCM{n) V N-IMOP{n) N-Delayable* (n) (11.61) 

holds trivially by Lemma 11.3.2, if n satisfies the predicate InsertiBCM ■ In 
order to complete the proof of the second implication it is therefore sufficient 
to show 
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ypGlP[s*,n].{top{lp[l,Xp[f^i{stko)) = true) ^ (1162) 

3 1 < i < Xp.InsertiBCM{Pi)XSameInCp[i, Xp[A^Comp*^p^^'^{p[i, Ap[) ^ ’ 

We prove this formula simultaneously for all nodes n € N* by induction on 
the length k of path p[l, Ap[ for all paths p satisfying 

pGlP[s*,n] A top{lp[l,Xp[j*^i{stko)) = true (11.63) 

As in the proof of the first implication the case k = 0 does not occur, and 
therefore, we can start with considering a path p[l,Ap[ of length k=l. In 
this case we obtain p[l,Ap[=(pi) and pi=s* G N* \{N* U N*). Hence, 
Lemma 11.3.1(1) yields 

-^Comp (pi) A Insert iBGM{pi) = top{stk[j) 

such that (11.62) holds trivially for i = 1. 

In order to prove the induction step, let fc > 1, and assume that (11.62) holds 
for all paths q with Ag[i_A,[ < k, i.e., 

(IH) (VgGlP[s*,n]. 1< A,[i,;,,[<fc). 
top{lq[l,Xg[j*^i{stko)) = true ^ 

(31 < i < Xq. Insert iBCM{qi)ASameInCq[i, Xq[A^Comp* (9, i) (<?[*) D) 

It is sufficient to show that for every path p G IP[s*,n] with Ap[i_;^p[ = /c 
satisfying (11.63) holds 

3 1 < f < Ap. Insert iBcmipi) A SameInCp[i, Ap[ A ^Comp*^p^^~^{p[i, Ap[) 

Thus, we can assume that there is such a path p, which then can be rewritten 
as p = p';{m);{n) with p' =p[l, fc[, m = pk G pred* (ji), and fc+1 = Ap. Similar 
to the proof of the first implication we must now investigate three cases 
depending on the type of node m. 

Case 1. mG N* \{N* U N*) 

In this case we obtain by Lemma 11.3.1(1) 



^ Comp (to) 


(11.64) 


and by Lemma 10.2.3(1) 




RhsLev^‘^^^^\k) = RhsLev^‘^^<-"^{Xp) 


(11.65) 


Hence, we have 




SameInCp[k, Ap[ 


(11.66) 


Moreover, Lemma 10.2.1(1) yields 




-^Comp{m) ^Comp* (^pf.'^{m) 


(11.67) 
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If Insert IB CM i'rn) holds, the induction step follows trivially from (11.64), 
(11.66), and (11.67) for i = k. Otherwise, i.e., if InsertjBCMi'rn) does not 
hold. Lemma 11.3. l(lb) yields 

top{lp'j*di{stko)) = true ( 11 . 68 ) 

Applying the induction hypothesis (IH) to p';{m), we get 

31 < ip> < k. InsertiBCM{Pi ,) A SameInCp[ipi ,k[ A 

-^Comp*\^i^,){p[ip, ,k[) (11.69) 

According to (11.64), (11.66), and (11.67), (11.69) is equivalent to 



31 < ip! <k. Insert iBcmiPi /) A SameInCp[ipi , Ap[ A 
-^Comp*fp^,^,){p[ip,,Xp[) 

Thus, the induction step follows for i = ipi. 

Case 2. m G N* 

In this case. Lemma 11.3.1(2) directly yields 



(11.70) 



Var{t) C GlobVar{fg{succ* {m))) A ^Comp{m) 



The induction step can now be proved along the lines of Case 1 using Lemma 
10.2.3(2) and Lemma 11.3.1(2) instead of Lemma 10.2.3(1) and Lemma 
11. 3.1 (lb), respectively. 

Case 3. m G N* 

Clearly, in this case we have -^Comp (to), and can proceed with the following 
case analysis. 

Case 3.1. Var(t) C GlobVar{fg{pred* (m))) 

In this case. Lemma 10.2.3(3) delivers 

RhsLeVp°'^^^\k) = RhsLev^°'^^^\Xp) 

Moreover, Lemma 11.3.1(3) yields 

top il p' Tdii^tko)) = true 

Applying the induction hypothesis (IH) to p';{m), the induction step can be 
proved as in Case 1. 

Case 3.2. Var{t) % GlobVar{fg{pred* {m))) 

Let pk' be the matching return node oi pk = rn on p, i.e., pk' G N*, pk G N*, 
and p]k',k[ G ClP[succ*{pk'),pred*{pk)]- Lemma 11.3.1(3) then delivers 

-^Gomp{pk') ( 11 - 71 ) 



and 



Insert iBCMiPk') V top {{ p" j^iistko)) = true 
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where p”=df Additionally, Lemma 10.2.6(2) and Lemma 10.2.7 yield 

RhsLevl^'^^^\k') = RhsLevl^^^^\Xp) (11.72) 

and 

-nComp*fpf,,^{p[k',k]) (11.73) 

respectively. In particular, (11.72) implies 

SameInCp[k' , Xp[ (11.74) 

Thus, if Insert iBCMiPk') holds, the induction step follows immediately for 

i = kX Otherwise, i.e., if Insert iBCM{pk') does not hold, we have 

topilp” Tdii^t^o)) = true 

Applying the induction hypothesis (IH) to p"]{pk'), we obtain 



3 1 < ip// < k'. Insert iBCM{Pi^„) A SameInCp[ip'> , k'[ A 

Combining (11.72), (11.74), (11.71), and (11.73), we get that (11.75) is equiv- 
alent to 



3 1 < Zp// < k'. Insert ibcm{p^. 

-nComp*fpi 



) A SameInCp [ip " , Ap [ A 

)(p[zp//,Ap[) 



(11.76) 



Hence, the induction step follows for z = ip" . This completes the proof of the 
relevant part of Theorem 11.3.1. □ 



Applying Lemma 11.3.3, Lemma 11.3.4, and Theorem 11.3.1 we obtain the 
main result of this section: Adi is precise for interprocedural delayability. 
Consequently, the ZMPP-solution computed by Adi coincides with the set of 
all program points being interprocedurally delayable in the sense of Definition 
10.5.1. 

Theorem 11.3.2 (AdrPrecision). 

Adi is precise for interprocedural delayability, i.e., Adi is terminating and 
dl -precise. 



11.4 IDFA-Algorithm A^n- Interprocedural Unusability 

In this section we present the specification of the IDFA-algorithm Aun for 
computing the set of interprocedurally unusable program points.® The central 
result of this section is the A„„-Precision Theorem 11.4.2. It yields that the 
algorithm is precise for this property: it terminates with the set of program 
points, which are interprocedurally unusable in the sense of Definition 10.5.3. 



® The index un stands for unusability. 
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We recall that the determination of unusable program points requires a 
backward analysis of S (cf. Section 8.7 and Section 9.2.1). Therefore, the 
roles of call nodes and return nodes are interchanged in the definitions of 
the local semantic functions (cf. Section 11.4.1) and the return functions (cf. 
Section 11.4.1), and the start information is attached to the end node (cf. 
Section 11.4.1). 

11.4.1 Specification 

Data Flow Information. The domain of data flow information of Aun is 
given by the lattice 

(c,n, C,T, T)=df (B, A, <, false, true) 

This is the same lattice as for the delayability analysis, however, the inter- 
pretation is different. In the context of the unusability analysis, a data flow 
information attached to a program point expresses, whether a placement of 
t is unusable at this point or not. For the same reason as for the delayability 
analysis, it is not necessary to use a product lattice for storing transparency 
information explicitly (cf. Section 11.3.1). 

Local Semantic Functional. The local semantic functional | : N* 

{B^B) of Aun is deflned by 

N*ybGB. |nL„(6)=d/6' 

where 

{ I- Latest (n) V ( ^ Comp (n) A 6 ) 

if n e N*\Nf V Var{t) C GlobVar{fg{pred* (n)))) 
true otherwise 



Note that the local semantic functional is the straightforward extension of 
its intraprocedural counterpart to the interprocedural setting. 

Return Functional. The return functional TZun '■ Nf {B^ ^ B) of Aun 
is deflned by 

Vn G iV: V ( 61 , 62 ) G B\ 7^„„(n)(6l,62)=d/6' 

where 

( 62 if Var(t) C GlobVar{fg{succ* {n))) 

\ I- Latest {n) V {^Comp {n) A 61) otherwise 

Intuitively, if some operands of t are locally declared in the called procedure 
(i.e., Var{t) % GlobVar{fg{succ*{n)))), then t cannot be used inside the 
called procedure as this concerns a different incarnation of t. Consequently, 
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t is unusable in this case immediately before entering the procedure, if it is un- 
usable immediately after leaving it (i.e., bi = true) and if it does not occur as 
a parameter argument (i.e., ^Comp (n)), or if the procedure call node is a po- 
tential insertion point of the /LCM-transformation (i.e., I-Latest{n) = true). 
On the other hand, if all variables of t are global for the procedure called (i.e., 
Var{t) C GlobVar{fg{succ*{n)))), t is unusable before the procedure call, if 
this information could be propagated to the entry of the called procedure 
(i.e., b 2 = true). 

Start Information. The start information is given by the element 

true G B 

Note that this information is mapped to the end node of the program because 
unusability is a backward problem. Intuitively, it expresses that a computa- 
tion cannot be used after the termination of the program, and therefore a 
computation placed at the end of the program is unusable. 

Interpretation. The interpretation of lattice elements in B is given by the 
identity on B, i.e., the function Intun ■ B B is defined by 

I'^tun — df Ids 



11.4.2 Proving Precision 

Similar to the other analyses, we first give a technical lemma, which simplifies 
the proof of the rtn-Precision Theorem 11.4.1. This lemma can easily be 
proved by means of the definitions of the local semantic functional and the 
return functional of Aun- 

Lemma 11.4.1. 

1. Vn e N* \{N* U N*) \/stk G STACK. 

a) TLatest{n) top(| n ]*„(stfc)) = trwe 

b) ^TLatest{n) {top{\n\.^.^{stk)) = true 4=^ 

-^Comp{n) A top (stk) = true) 

2. ynG N* Wstk G STACK >2- 

a) TLatest{n) => top(| n ]*„(stfc)) = trne 

b) ~^TLatest{n) {top{\n\*^.^{stk)) = true 4=^ ^Comp{n) A 

f top (stk) = true if Var{t) C GlobVar{fg{succ*{n))) 

\ top{pop{stk)) = true otherwise 

3. WnGN* WstkG STACK, top {{ n f^^{stk)) = true 4=^ 

top{stk) = true if Var{t) C GlobVar{fg{pred*{n))) 
true otherwise 



Descending Chain Condition. The finiteness of B directly implies the 
finiteness of its corresponding function lattice \B~^B]. Thus, we have: 
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Lemma 11.4.2 (Descending Chain Condition). 

The function lattice satisfies the descending chain condition. 

Distributivity. Along the lines of Lemma 11.3.4 we can prove: 

Lemma 11.4.3 (| ]^^,7?.„„-Distributivity Lemma). 

1. The local semantic functions n€ N* , are distributive. 

2. The return functions TZun{n), n € Nf, are distributive. 

nn-Precision. Finally, we have to prove the coincidence of interprocedural 
unusability and the interprocedural meet over all paths solution of Aun- Like 
in the previous sections, we only prove that part of the precision theorem, 
which is needed for the definition of the /LCM-transformation. We have: 

Theorem 11.4.1 (un-Precision Theorem). 

For all nodes n G N* we have: 

1. N-Unusable* {n) if and only if 

2. X-Unusable* {n) if and only if |•^_t^.„e)(n)) 

Proof. 

As mentioned above, we will only prove the second part of Theorem 11.4.1. 
In order to simplify the notation, we abbreviate Intun ° N-IMOP (j 
and newstack{true) by N-IMOP and stko throughout the proof. 

The first implication, “=J>”, 

Vn G N* . X-Unusable* (n) N-IMOP{n) 



is proved by showing the even stronger implication 

Vp G IP[s*,e*]. pi = n ^ 

(Vf + 1 < j < Ap. C'omp*(p_i+i)(pj) ^ 

3z + 1 < r < j. I-Latest(j>r) A SameInCp[i + l,r[) 

^ top{lp\iAp]Tun(stko)) = true 



(11.77) 



which we simultaneously prove for all nodes n G N* by induction on the 
length k=df Xp — i of postfix-path p]i, Ap] for all paths p satisfying 

pGlP[s*,e*] A Pi = n A 

(Vz -k 1 < j < Ap. C'omp*(p_,+i)(pj) ^ (11.78) 

3z -I- 1 < r < j. I-Latest{pr) A SameInCp[i + l,r[) 

If fc = Ap]i = 0, we obtain p]i, Ap] = e, and therefore 

top(lp]i, Ap] fuui^tko)) = top(l e l*„(stfco)) = top(stko) = true 



holds trivially. 
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In order to prove the induction step, let fc > 0, and assume that (11.77) 
holds for all postfix-paths q]i, Ag] with Agjj < k, i.e., 

(IH) (Vg G IP[s*,e*]. qi = n AO < A,],, a,] <^)- 

(Vz-hl<j<Ag. C'omp*(g_i+i)(grj)^ 

3 i -I- 1 < r < j. I-Latest {qr) A SameInCq[i -I- 1, r[ ) 
^ top{\ q]i, \q] iL(s^^o)) = true 

It is sufficient to show that for every path p G IP [s* , e*] with Pi = n and 
Ap]i,Ap] =k satisfying (11.78) holds 

top(lp]i, Xp] ll^istko)) = true 

Thus, we can assume that there is such a postfix-path p]i,Xp], which then 
can be rewritten as p]i, Ap] = (pi+i);p' with p' =p[i + 2, Ap]. Next, we have 
to investigate three cases depending on the type of node Pi+i- 
Case 1. K+i G N*\{N*UN;) 

If pi+i satisfies the predicate I-Latest, Lemma 11.4.1(la) yields as desired 

top(lp]i, Xp] l*„(stfco)) = top{lpi+i 1*„(Ip' ]*„(s^^o))) = true 

Thus, we are left with the case that pi+i does not satisfy the predicate 
I-Latest. Together with (11.78) this implies 

-^Comp{p^+l) (11.79) 

Moreover, Lemma 10.2.3(1) yields 

RhsLev^‘^^<-*\i + 1) = RhsLev^‘^^<-*\i + 2) (11.80) 

Hence, 

Vz -f 1 < j < Ap. C'omp*(p_,+i)(pj) ^ 

3z -I- 1 < r < j. I-Latest (pr) A SameInCp[i 3- l,r[ ' ’ 

is equivalent to 

Vz -f 2 < j < Ap. Comp*(p_,+2) (Pi) ^ 32) 

3 z -I- 2 < r < j. I-Latest (n^) A SameInCp[i -I- 2, r[ ' ’ 

Tims, the induction hypothesis (IH) can be applied to p' yielding 

top{{p' ll_.^{stko)) = true (11.83) 

Now, by means of (11.79) and (11.83), Lemma 11.4. l(lb) delivers 

top{lp\i, Xp] l*„(stfco)) = top{{pi 1*„(Ip' ]*„(s^^o))) = true 



which completes the proof of Case 1. 
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CasG 2. Pi-\-i € 

If Pi satisfies the predicate I- Latest, the induction step 
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top{lp]i, Ap] l*„(stfco)) = top{lpi+i ]*„(s^^o))) = true 

is an immediate consequence of Lemma 11.4.1(2a). Thus, we are left with the 
case that pi+i does not satisfy the predicate I-Latest. Similar to Case 1 this 
implies 

-^Comp{p^+l) (11.84) 

We must now distinguish, whether t contains local variables of fg{succ* {pi +i)) 
or not, and proceed with the following case analysis. 

Case 2.1. Var{t) C GlobVar{fg{succ* (pi+i))) 

In this case. Lemma 10.2.3(2) yields 

RhsLeVp°'^^^\i + 1) = RhsLev^°‘'^^^\i + 2) (11.85) 

As in Case 1, this allows to apply the induction hypothesis to p' yielding 

top{{p' ll^{stko)) = true ( 11 . 86 ) 



The induction step 

top{{p]i, Ap] l*„(stfco)) = top{[pi+i 1*„(Ip' ]*„(s^^o))) = true 



follows now directly from (11.84) and (11.86) by means of Lemma 11.4.1(2b). 



Case 2.2. Var{t) % GlobVar{fg{succ* (pi+i))) 

Clearly, the premise of this case implies 

Var{t) n LocVar{fg{succ* {pi+i))) yf 0 (11.87) 

Let Pi! be the matching return node of Pi+i on p, i.e., Pi+i G N*, pi> G N*, 
and p]i+ l,z'[ G ClP[succ* {pi+i),pred* {pf)]. By means of (11.87), Lemma 
10.2.7 yields 

y f G {i + 2, . . . ,i'}. ^Gomp* f^p i_^_i^{pj') A ^SameInCp[i + l,j'[ (11.88) 

Moreover, Lemma 10.2.6(2) delivers 

RhsLev^‘^^<-*\i + 1) = RhsLevl^^<-*\i' + 1) (11.89) 

Hence, 

Vi + 1 < i < Ap. Comp*(p _,+i)(pj) ^ 

3i + 1 < r < j. I-Latest (pr) A SameInCp[i + l,r[ 



(11.90) 
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is equivalent to 

Vz' + 1 < j < Ap. + 1 < r < j. 

I-Latest{pr) ^ SameInCp[i' + l,r[ ' ’ 

Abbreviating the path p]i' , Ap] by p", the induction hypothesis (IH) delivers 

top{lp” llr,{stko)) = true (11.92) 

This implies 

top{pop{\p' ]*„(stfco))) = top{\p'' jlJstko)) (11.93) 

since intervals of call nodes and return nodes on interprocedural paths are 
either disjoint or one is included in the other (cf. Lemma 7.2.1). From (11.92) 
and (11.93) we get 

top{pop(lp' iL(s^^o))) = true (11.94) 

The induction step 

top(lp]i, Ap] l*„(stfco)) = top(lpi+i 1*„(Ip' ]*„(s^^o))) = true 
is now an immediate consequence of (11.84), (11.94), and Lemma 11.4.1(2b). 



Case 3. Pi-\~i € 

If Var(t) % GlobVar{fg{pred*{pi+i))), the induction step 

top(lp]i, Ap] l*„(stfco)) = top{lpi+i 1*„(Ip' ]*„(s^^o))) = true 
holds trivially by means of Lemma 11.4.1(3). Thus, we are left with the case 
Var{t) C GlobVar{fg{pred* {pi+i))) (11.95) 

In particular, Pi+\ G N* implies 

-^Gomp{p^+l) (11.96) 

Additionally, Lemma 10.2.3(3) yields 

RhsLev^‘^^<-*\i + 1) = RhsLev^‘^^<-*\i + 2) (11.97) 

Hence, 

Vz + 1 < j < Ap. C'omp*(p_,+i)(pj) ^ gg 

3z + 1 < r < j. I-Latest{pr) A SameInCp[i + l,r[ ' ’ 

is equivalent to 

Vz + 2 < j < Ap. Gomp (py+ 2 )(Pi) 

3 z + 2 < r < j. I-Latest (pr) A SameInCp[i + 2, r[ ' ’ 
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Thus, the induction hypothesis (IH) can be applied to p\ which delivers 

iop{{p' = true (11.100) 

Now, by means of (11.100), Lemma 11.4.1(3) yields as desired 

top{{p]i, Ap] l*„(stfco)) = top{[pi+i 1*„(Ip' ]*„(s^^o))) = true 
which proves the induction step in this case. 

The second implication, “4=”, 

Vn e N* . N-IMOP{n) X-Unusable* (n) 



is proved by showing the even stronger implication 

(Vp G IP[s*,e*]. p, = n). {top{lp]i,Xp\ jl^{stko)) = true) 

^(Vt+l<j<Ap. C'omp*(p_i+i)(pj)^ (11.101) 

+ 1 < r < j. I- Latest {pr) A SameInCp[i + l,r[) 

which we simultaneously prove for all nodes n G N* by induction on the 
length k=df Xp — i of postfix-path p]i, Ap] for all paths p satisfying 

pGlP[s*,e*j A Pi = n A top{lp]i, Xp]]l^{stko)) = true (11.102) 

The induction base for k = Apjjp^^] = 0 holds trivially. 

In order to prove the induction step, let k > 0, and assume that (11.101) 
holds for all postfix-paths q]i, Ag] with Agji^A,] < k, i.e., 

(IH) (Vg G IP[s*,e*j. qi = n AO < A,],, a,] <^)- 

( topil q]i, Xq] ]*„(stfco)) = true ) ^ 

(Vz-hl<j<Ag. C'omp*(g_i+i)(9g)^ 

3z-|- 1 < r < j. I-Latest(qr) A SameInCq[i + l,r[) 

It is sufficient to show that for every path p G IP[s*,e*] with pi = n, and 
^p]i,Ap] =k satisfying (11.102) holds 



Vz -f 1 < j < Ap. C'omp*(p_i+i)(pg) ^ 

3z -I- 1 < r < j. I-Latest{pr) A SameInCp[i + l,r[ 



(11.103) 



Thus, we can assume that there is such a postfix-path pjz, Ap]. This path can 
then be rewritten as p]i, Ap] = (pi+i);p' with p' =p[i + 2, Ap], and an index 
j with z 3- 1 < j < Ap and C'omp*(p j_|_]^)(pj). Similar to the proof of the 
first implication we must now investigate three cases depending on the type 
of node Pi+i- 
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Case 1. K+i G N* \{N* U N^) 

If pi+i satisfies the predicate I- Latest, the induction step follows trivially for 
r = i + l. If pi+i does not satisfy the predicate I-Latest, Lemma I1.4.I(Ib) 
yields 

-^Comp{pi+i) A top(lp' ll^{stko)) = true (11.104) 

Applying the induction hypothesis (IH) to p' , we obtain 



Vz + 2 < j < Ap. Comp*(p _,+2) (Pj) 

3i + 2 < r < j. I-Latest {pr) A SameInCp[i + 2, r[ 

(11.105) 



In particular, this implies that 



Tp'=df Min{{r \ i 2 < r < j A I-Latest {pr) A SameInCp[i + 2,r[}) 
is well-defined. Moreover, Lemma 10.2.3(1) yields 

RhsLeVp‘^^^^\i -I- 1) = RhsLev^°‘'^^^\i -\- 2) (11.106) 

Combining (11.104) and (11.106), we obtain that (11.105) is equivalent to 



Vz -f 1 < J < Ap. Comp* f^p^ij^^){pj) => 

3z -I- 1 < r < J. I-Latest {pr) A SameInCp[i -\- l,r[ 



(11.107) 



Hence, the induction step follows for r = rpi. 

Case 2. Pi+i G 

Similar to Case 1, whenever Pi+i satisfies the predicate I- Latest (pi+i), the 
induction step follows trivially for r = z -|- 1. Thus, we are left with the 
case that Pi+i does not satisfy the predicate I-Latest. By means of Lemma 
11. 4.1 (2b) we directly get 

-^Comp{pi+i) (11.108) 

We must now distinguish, whether t contains local variables of fg{succ* {pi+i)) 
or not, which leads to the investigation of the following two cases. 

Case 2.1. Varit) C GlobVar{fg{succ* (pi+i))) 

In this case Lemma 11. 4.1 (2b) yields 

top{lp'llriistko)) = true (11.109) 

Hence, the induction hypothesis can be applied to p'. Moreover, Lemma 
10.2.3(2) implies 

RhsLeVp‘^^^^\i -\- 1) = RhsLeVp‘^^^^\i -\- 2) ( 11 . 110 ) 



The induction step follows now as in Case 1. 
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Case 2.2. Var{t) % GlobVar{fg{succ* (pi+i))) 

In this case we have 

Var{t) n LocVar{fg{succ* {pi+i))) ^ 0 



Let Pi! be the matching return node of Pi+i on p, i.e., Pi+i G N*, pi> G N*, 
and p]i + l,i'[ G dV[succ*{pi+\),pred*{pi')]. As a consequence of 

top{{p]i, Ap] l*„(stfco)) = top{[pi+i ]*„(s^^o))) = true 

we obtain by means of Lemma 11. 4.1 (2b) 

top{lp”X,^{stko)) = true ( 11 . 111 ) 

where p" =df p]i' ,Xp\- The induction hypothesis (IH), therefore, yields 



Vi' + 1 < j < Ap. C'omp*(p_p+i)(pj) 

3 i' + 1 < r < j. I-Latest (pr) A SameInCp[i' + 1, r[ 



( 11 . 112 ) 



Similar to Case 1, this implies that 

fp"=df Min{{r | i' + 1 < r < j A I-Latest{pr) A SameInCp[i' + 1, r[}) 

is well-defined. Additionally (11.108) and Lemma 10.2.7 yield 

^ Comp* + (11.113) 

Moreover, Lemma 10.2.6(2) yields 

RhsLevl°‘''^^\i + 1) = RhsLevl°‘'^^^'^ {i' + 1) (11.114) 

Thus, (11.112) is equivalent to 

Vi -h 1 < j < Ap. C'omp*(p_i+i)(pj) ^ 

3i' -I- 1 < r < j. I-Latest (pr) A SameInCp[i 3- l,r[ ' ’ 

and therefore, the induction step follows for r = Tpn in this case. 

Case 3. pi+i G N* 

Clearly, we have 

^Comp{pi+i) (11.116) 

If Var{t) 2 GlobVar{fg{pred* {pi+i))), we have 

V i -I- 1 < j < Ap . ^SamelnCp [i-\- l,j[ 

Together with (11.116) this implies 



C'omp*|,_,+i)(p[i-h l,Ap[) 
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Hence, the induction step holds trivially in this case. We are therefore left 
with the case 

Var{t) C GlobVar{fg{pred* {pi+i))) 

By means of Lemma 11.4.1(3) we get 

iop{{p' = true (11.117) 

Thus, the induction hypothesis can be applied to p' . Moreover, Lemma 
10.2.3(3) yields 

RhsLev^‘^^<-*\i + 1) = RhsLev^‘^^^*\i + 2) (11.118) 

Hence, the induction step follows as in Case 1. This completes the proof of 
the second implication, and finishes the proof of the relevant part of Theorem 
11.4.1. □ 

Combining Lemma 11.4.2, Lemma 11.4.3, and Theorem 11.4.1, we get the 
central result of this section: Aun is precise for interprocedural unusability. 
This guarantees that the /MFP-solution computed by Aun coincides with 
the set of program points, which are interprocedurally unusable in the sense 
of Definition 10.5.3. 

Theorem 11.4.2 (Mun-Precision). 

Aun is precise for interprocedural unusability, i.e., Aun is terminating and 
un-precise. 




12. Perspectives 



In this chapter we reconsider the anomalies observable for interprocedural 
code motion from a different point of view giving additional insight into 
their causes and the differences to the intraprocedural setting. We show that 
anomalies of this kind are not specific for the interprocedural setting, but have 
analogues in other programming paradigms and application scenarios. After- 
wards, we discuss a variety of pragmatic aspects related to our framework, 
and, finally, give directions to future work concerning both the framework 
and the application side. 



12.1 Reconsidering Code Motion Anomalies 

In Chapter 10 we demonstrated that there are essential differences between 
the intraprocedural and interprocedural setting regarding code motion. Most 
importantly, in the interprocedural setting (1) computationally optimal re- 
sults are in general impossible, and (2) safety is in general not equivalent to 
the disjunction of up-safety and down-safety. The loss of the decomposabil- 
ity of safety, i.e., the failure of the intraprocedural decomposition theorem 
for safety (cf. Lemma 3.2.1), turned out to be the source of several placing 
anomalies showing up when adapting intraprocedural code motion strategies 
to the interprocedural setting. Extensions of intraprocedurally computation- 
ally optimal strategies can dramatically degenerate a program, even if there 
is a computationally optimal canonic result (cf. Figure 10.18). 

Other paradigms and application scenarios. Because of the practical impact 
of the failure of the decomposition theorem for safety on interprocedural code 
motion, one should note that this failure is not specific for the interproce- 
dural setting. It has analogues in other programming paradigms and appli- 
cation scenarios causing similar anomalies when extending intraprocedural 
code motion strategies accordingly. In [KSV3], this has been demonstrated 
considering code motion in a setting with explicitly parallel programs and 
shared memory. For the same program setting, but a different application, 
the elimination of partially dead assignments, this has been demonstrated in 
[Kn2, Kn3]. In [KRS7], the failure of the decomposition theorem has been 
proved for the intraprocedural setting considering semantic code motion in- 
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stead of syntactic one as in this monograph. The point of syntactic code mo- 
tion is to treat all term patterns independently and to regard each assignment 
as destructive for the value of every term pattern containing the modified 
variable. The algorithms for busy and lazy code motion are syntactic code 
motion algorithms. In contrast, semantic code motion aims at eliminating 
computations, which are semantically partially redundant, and is thus more 
powerful than syntactic code motion (cf. [BA, RWZ, St3, SKRl, SKR2]). In 
this scenario, however, where safety is based on a semantic notion of term 
equivalence, it cannot equivalently be decomposed into the corresponding no- 
tions of up-safety and down-safety. As a consequence, an essential difference 
between “motion-based” and “placement-based” approaches for the elimi- 
nation of semantically partially redundant computations shows up: specific 
partially redundant computations are out of the scope of any motion-based 
approach, and can only be eliminated by placement-based approaches. Intu- 
itively, a placement-based approach is free to place a computation at any safe 
program point. In contrast, a motion-based approach is limited to program 
points, where the computation can “safely” be moved to, i.e., where in ad- 
dition all nodes in between the old and the new location are down-safe. As 
for interprocedural syntactic code motion, also for semantic code placement 
computationally optimal results are in general impossible. In fact, it is the 
validity of the decomposition theorem for safety applying to syntactic code 
motion in the intraprocedural setting, which implies that motion-based and 
placement-based approaches are of the same power for this scenario. 

Interprocedural code motion. We now return to the interprocedural setting, 
and focus again on interprocedural code motion. As demonstrated in Chap- 
ters 10 and 11 , the placing anomalies exhibited by interprocedural exten- 
sions of intraprocedural placing strategies are not caused by the data flow 
analyses involved, but by the optimizing transformation itself. In the follow- 
ing, we reconsider these anomalies from a different point of view in order to 
give additional insight into the differences between the intraprocedural and 
interprocedural setting. To this end we consider the effect of the as-early- 
as-possible placing strategy underlying busy code motion for the example of 
Figure 10.18. Recall that the program of this example is free of any partially 
redundant computation, and hence, computationally optimal. As illustrated 
in Figure 10.19, it is dramatically degenerated by an application of the as- 
early-as-possible placing strategy. 

In order to reveal the difference to the intraprocedural setting, we extract 
the essence of this example and investigate it in a comparable intraprocedural 
situation established by means of procedure inlining, where we concentrate 
on the procedures tts and 7 T 4 first. The left program fragment of Figure 
12.1 shows procedure 7 T 4 after inlining 713 together with the set of (entry) 
down-safe and (entry) earliest nodes. The right program fragment shows the 
corresponding program resulting from busy code motion. As in Figure 10.19 
an insertion has been moved into the loop to node 14 , but the computation 
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originally located in the loop could be removed from it by moving it to node 
20 . Hence, the programs are in the cernel of the relation “computationally 
better”. In the example of Figure 10.18, the comparable transformation of 
moving the computation from node 12 to node 20 is prevented by the second 
call of procedure tts located in 7T2, which results in the placing anomaly of 
Figure 10.19. This call, however, is not taken into account in the modelling 
of Figure 12.1. 

Thus, for the sake of a “fair” comparison of the interprocedural and in- 
traprocedural setting, we have to consider the situation after inlining tts for 
both of its calls, where the inlined procedure body is “shared” by the two 
former calls. The right program fragment of Figure 12.2 then shows the result 
of busy code motion. Obviously, the defect of the interprocedural situation 
does not realize. Intuitively, this is because the computation-free execution 
path from node 6 to node 9 across node 15 is “visible” for program exe- 
cutions reaching node 24 , and prevents moving a + b into the loop to node 
23 . This reveals the essential difference to the interprocedural situation of 
Figure 10.18. Though the corresponding computation-free path is present as 
well, it is “invisible” at the call site of tts in 7T4: program executions entering 
procedure tts at the call site at node 22 cannot be completed by program 
continuations reaching node 9 because the call/return-behaviour of procedure 
calls would be violated. The validity condition imposed on interprocedural 
program paths excludes these paths. 

Intuitively, this can be considered the dilemma of customization of in- 
terprocedural DFA for program optimizations, where, like for code motion, 
the correctness and profitability of a modification at a specific program point 
(considering code motion, replacing an original computation by its corre- 
sponding temporary) relies on a “global” modification of the program text, 
i.e., on a modification involving several program points (considering code 
motion, inserting initializations of the temporay) . Interprocedural DFA com- 
putes information along valid paths only, and thus, provides for every pro- 
gram point including call sites the “most precise” information possible. Cus- 
tomization can thus be considered the central goal of interprocedural DFA. 
The problems arise on the transformation side, where customization of in- 
formation concerning call sites gets in conflict with transformations, where 
the global modifications required for separately justifying the “local” im- 
provements (considering code motion, those temporary initializations, which 
justify correctness at a particular use site) do not behave “compositionally” . 
Considering code motion, this means inserting or removing computations 
from procedures, which are called at several call sites. This is the source of 
potential conflicts. They become effective, if as for instance in the example of 
Figure 10.19, from the point of view of some call site a computation must be 
inserted in (removed from) the callee, which, however, is prevented by a dif- 
ferent call site because safety and profitability would be affected. In essence, 
it is this kind of conflict causing the anomalies of interprocedural busy code 
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^ 4 after inlining of it 3 After the BCM-transformation 

Fig. 12.1. Intraproc. situation of the “right” part of Fig. 10.18 

motion. Optimizations, which do not require “global” modifications in the 
sense above, are in fact not affected by this dilemma. A prominent example 
is constant propagation (cf. [Kel, Kil, Ki2, RLl, RL2, WZl, WZ2]). It does 
not rely on inserting new statements, but on modifying existing ones only. 
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Hence, its power depends directly on the precision of the DFA information, 
and thus its customization. 




Earliest 



n 2 snd ^ 4 after “shared” inlining of it 3 



Fig. 12.2. Intraprocedural situation of Figure 10.18 



After the BCM-transformation 



Summary. Considering interprocedural code motion there is no general way 
of getting rid of these problems as computational optimality is in general 
impossible. In the following, we therefore sketch three orthogonal approaches 
centered around optimality and generality aiming at computational and life- 
time optimality for specific program classes and applicability to all programs 
without exhibiting anomalies, respectively. 

I. Abstract interpretation of control. Intuitively, the first approach is, in ad- 
dition to abstractly interpret the semantics of a program, to also abstractly 
interpret its control flow. An extreme, however, not general approach falling 
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into this group is to consider the interprocedural control flow graph of a pro- 
gram as an intraprocedural flow graph, i.e., computing data flow information 
according to all paths, not just interprocedurally valid ones as suggested by 
the example of Figure 12.2. This, however, is limited to data flow proper- 
ties, which must be satisfied along all program executions reaching a specific 
program point as e.g. availability . Properties, which are justified if they are 
valid along some program execution as e.g. partial availability, are not safely 
approximated. Thus, problems simultaneously involving existentially and uni- 
versally quantified properties, cannot safely be approximated, and hence not 
handled at all. 

II. Keeping Optimality. The second approach, which we followed in this 
monograph, is characterized by retaining optimality for specific program 
classes at the price of losing generality. Canonicity of the /iJCM-transfor- 
mation is the guarantor of the computational and lifetime optimality of 
the interprocedural versions of busy and lazy code motion, respectively. As 
demonstrated, generality is lost: in the absence of canonicity, the result can 
dramatically be degenerated. It is an important matter of future work to 
characterize further program classes having a computationally and lifetime 
optimal counterpart together with an algorithm constructing them. Partic- 
ularly promising is here the class of programs having a computationally op- 
timal canonic counterpart. The IBCM- and /LCM-transformation provide a 
good starting-point in order to enhance them accordingly. 

III. Keeping Generality. Basically, the third approach is characterized by re- 
taining generality at the price of losing optimality. In essence, this can be 
achieved by heuristically limiting the motion of computations across proce- 
dure boundaries in order to avoid anomalies. The algorithm of Morel and 
Renvoise of [MR2] is a prominent example following this approach. Another 
example is the algorithm of [KM2] dealing with the elimination of partially 
redundant and partially dead assignments in High Performance Fortran pro- 
grams. Unfortunately, heuristics are often overly restrictive unnecessarily 
reducing the transformational power of an algorithm. In the following we 
demonstrate this by illustrating the limitations of the heuristics of the algo- 
rithm of [MR2] imposed for avoiding (1) motion anomalies, and (2) unnec- 
essary motions of computations, which, as a side-effect, avoids unnecessarily 
long lifetimes of temporaries.^ 

( 1 ) Avoiding Motion Anomalies . Intuitively, the constraint introduced 
by Morel and Renvoise for avoiding motion anomalies prevents hoisting of a 
computation across a procedure call, if this requires an insertion inside the 
callee. In the example of Figure 10.18, the algorithm of [MR2] leaves the pro- 
gram thus invariant. However, this constraint is in general overly restrictive 
as illustrated by the program of Figure 12.3. For this example, the algorithm 

^ Morel and Renvoise did not introduce the second part of this heuristics as a means 
for limiting lifetimes. This is indeed rather a side-effect of avoiding unnecessary 
motions they were aiming at. 
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of [MR2] generates the program of Figure 12.4, failing to eliminate the par- 
tially redundant computation at node 22 inside the loop. The point here is 
that because of the motion contraint moving a + b across the call sites of 
7 T 3 is prevented because this would require an insertion inside the callee 
The constraint is too restrictive in that it does not take into account that in 
this example all call sites require an insertion inside the callee. For compari- 
son, interprocedural busy code motion generates the computationally optimal 
canonic result of Figure 12.5. 



7t j ; a, b, X, y, z ^2 ^3 ^4 




Fig. 12.3. The limitations of the motion-heuristic of Morel and Renvoise 



(2) Limiting the Lifetimes of Temporaries . In order to avoid unnecessary 
motions of computations, which, as a side-effect avoids unnecessarily long life- 
times of variables. Morel and Renvoise introduced a profitability constraint 
on moving computations into their pioneering intraprocedural code motion 
algorithm (cf. [MRl]): computations are moved to earlier computation points 
only if they are partially redundant. Intuitively, this means that computations 
are only moved if at least one program path profits from the move. This avoids 
some, though not all unnecessary motions of computations, without affecting 
the computational optimality of the algorithm. As demonstrated by the ex- 
ample of Figure 12.6 this does not carry over to the interprocedural setting. 
Here, the algorithm of [MR2] generates the sub-optimal result of Figure 12.7 
because limiting the motion of computations to partially redundant ones pre- 
vents hoisting of the computation at node 8 to node 7. As a consequence, the 
insertions in 7 T 2 and 714 cannot be moved into procedure tts. The motion 
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process is blocked. In contrast, interprocedural lazy code motion generates 
the computationally and lifetime optimal canonic result of Figure 12.8. 



2 




Fig. 12.6. Limitations of the lifetime-heuristic of Morel and Renvoise 



7t 



; a, b, X, y, z 
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Fig. 12.7. The effect of Morel and Renvoise’s algorithm 
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Fig. 12.8. Computationally & lifetime opt. result of the /iCM-transformation 



12.2 Pragmatics 

12.2.1 Static Procedure Nesting and Complexity of IDFA 

The framework we presented captures programs with statically nested pro- 
cedures, which is important concerning its applicability to “real world” pro- 
grams. It should be noted, however, that programs without statically nested 
procedures can often more efficiently be analyzed. We illustrate this by means 
of the down-safety and the earliestness analysis of the /iJCM-transformation 
of Chapter 10. In a program containing statically nested procedures the local 
variables of a procedure are global variables of its static successors. This intro- 
duces the phenomenon of variables being “relatively global” . It enforces the 
down-safety and the earliestness analysis besides keeping track on the vari- 
ables which are modified by an assignment, also to keep track on the static 
level of their declaration. Considering the down-safety analysis, we therefore 
used the lattice 

(C, n, C, T, T)=df {B X Af, Min, <, {false, 0), {true, oo) ) 

as the domain of relevant DFA-information. Intuitively, the second compo- 
nent of a data flow information stores the static level of the “most globally” 
declared variable, which has been modified in the current procedure call. 

In a program without statically nested procedures, the situation is much 
simpler. A variable is either “truly” global, if it is an external variable or if 
it is declared in the main procedure of the program, or it is “truly” local, 
if it is declared in one of its procedures. Hence, the down-safety and the 
earliestness analyses must only distinguish between modifications of global 
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and local variables. Regarding the down-safety analysis, this can be achieved 
by considering the simpler lattice 

(C, n, C, _L, T)=df ( B^, A , <, {false, false), {true, true ) ) 

as the domain of relevant data flow information. We remark that the choice 
of this lattice relies on the same intuition as above: the second bit of a DFA 
information attached to a program point indicates whether a global variable 
of the term under consideration has been modified. Following [KRS 4 ], the 
local semantic functional for down-safety simplifies to 

defined by 

Vn G N* V (61,62) G I nil, (61, 62)=^/ (6'i,6;) 

with 

6i=d/ Comp{n) V {Transp{n) A 61) 



6 ,= 



df 



62 A NoGlobalChanges{n) if n G N*\Nf 
true otherwise 



where the predicate NoGlobalChanges : N* ^ B is the analogue of the func- 
tion ModLev of Section 10 . 2 . 1 . It is defined by: 

yn G N*. NoGlobalGhanges{n)=dfVarLhsVar{n){t) ^ LocVar{n\{TTi}) 

Similarly, the definition of the return functional TZds simplifies to 

TZds ■■ Nf {B^ xB^^ B^) defined by 

Vn GNfy{ (61, 62), (63, 64) ) gB‘^x B \ TZds{n){ (61, 62), (63, 64) )=df (65, be ) 
with 

, _ f 63 if Var{t) C GlobVar{n) 

® ( Gomp{n) V (61 A 64) otherwise 

be=df 62 A 64 



The start information. Anally, is given by the element 

{false, true) G B x B 

Analogously, also the earliestness analysis can be based on a less complex 
lattice: 



(C, n, C, _L, T)=df {B‘^,{y , A), (>, <), {true, false), {false, true) ) 
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The local abstract semantics is then given by the functional 

defined by 

Vn G fV* V (6i, 62) eB^-ln 62)=^/ {b[,b^) 

with 

bi=df ^Transp{n) V {^I-DSafe{n) A 61) 

_ f 62 A NoGlobalChanges{n) if n G N*\N* 

‘V ^ otherwise 

Correspondingly, the return functional of the earliestness analysis simplifies 
to 

7^ea : N* {B^ xB^^ B^) defined by 
Vn G iV; V ( ( 5 i, 62), (63, 64) ) G X S". 7^ea(n)( (61, 62), (63, 64) )=d/ (65, 65) 
with 

, _ f 63 A ^I-DSafe{n) if Var(t) C GlobVar{II) 

® ^64 V {-^I-DSafe{ns) A 61) otherwise 

b 6 =df b 2 A 64 



The start information is then given by the element 
(si G Rangeit), true) G B x B 



In the remainder of this section we investigate the impact of static procedure 
nesting on the computational complexity of the /iJCM-transformation. More 
generally, this leads us to the complexity of so-called bit- vector problems, of 
which the /iJCM-transformation is a typical representative. Bit- vector prob- 
lems are characterized by expressing the relevant data flow information in 
terms of a finite number of predicates. Though the number of the predicates 
depends on the problem, it is independent of the size of the program being 
analysed. It thus only adds a constant factor to the complexity estimation. In 
practice, the number of mutually dependent predicates hardly exceeds two. 
Thus, the factor is even small. As demonstrated in [KS2], the only blow- 
up in size for programs without statically nested procedures stems from the 
fact that bit- vector algorithms usually work on members of linearily growing 
classes like the set of program variables or the set of program terms. This, 
however, adds even in the intraprocedural setting a multiplicative factor to 
the worst-case time complexity because the determination of the properties 
of the different members proceeds independently, as long as the program 
being analysed does not satisfy specific structural properties. In a program 
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without statically nested procedures a single member of such a class can be 
checked in linear time (cf. [KS2]). The proof of this result is as usual based on 
the assumption that the argument program is “constant branching”, i.e., the 
number of successors of a node in the flow graph system is bounded by a con- 
stant. This proof can straightforward be adapted to programs with statically 
nested procedures, which adds a quadratic factor to the complexity as stated 
by Theorem 12.2.1. We remark that this theorem relies on the assumption 
of constant branching. It can be dropped, however, using a refined argument 
on edges (cf. [KS2]). 

Theorem 12.2.1. The worst-case time complexity of solving a bit-vector 
problem for a single member of the class of interest is * n), where d 

and n denote the level of the statically most deeply nested procedure and the 
number of nodes of a flow graph system, respectively. 

For realistic programs it is reasonable to consider d a constant. Under this 
assumption. Theorem 12.2.1 yields that interprocedural bit-vector problems 
can be solved within the same worst-case time complexity as their intraproce- 
dural counterparts. However, a performance comparison in practice must also 
take the constant factor imposed by the nesting level into account. In essence, 
the worst-case time complexity is determined by the complexity of comput- 
ing the semantic functions for procedures. This complexity itself depends 
mainly on the maximal length of chains in the function lattice representing 
the domain of the abstract procedure semantics. Denoting by b the number 
of mutually dependent predicates under consideration, and by n the level of 
the statically most deeply nested procedure, the maximal chain length in the 
function lattice can be estimated by 

2^ * b 

for programs without statically nested procedures and by 

d*’ * (d * b) 

for programs with statically nested procedures (cf. [KS2]). Recalling that in 
practice d can be regarded a constant, this does not affect the worst-case 
time complexity, but nonetheless it can significantly influence the perfor- 
mance of an implementation. Transforming a program containing statically 
nested procedures into a program without statically nested procedures before 
analyzing it can thus often be profitable. Suitable transformations together 
with general side-conditions of their applicability and effectivity are described 
in [La2, Oil]. As a side-effect of these transformations, the resulting programs 
satisfy the sfmr-property; a fact, which is particularly important in the pres- 
ence of formal procedure calls. 
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12.2.2 Aliasing: Call by Name and Call by Reference 

As indicated in Chapter 6, the HO-DFA of our framework can also be used 
for computing (correct (i.e., safe) approximations of) the sets of may-aliases 
and must-aliases of a name or reference parameter of a procedure (cf. 
[Ban, Co, CpK2, We]).^ Alternatively, alias-analyses can also be formulated as 
IDFA-problems. A straightforward specification utilizes the power set lattice 
VS of variable identifiers occurring in a program as the domain of relevant 
DFA-information. Except for call nodes and return nodes the local semantics 
of the statements of an interprocedural flow graph is given by the identity on 
VS. The semantics of call nodes and return nodes is concerned with establish- 
ing and releasing the alias relation between the arguments of a call statement 
and the parameters of the called procedure. Defining the meet operation in 
VS as the set union operator and the set intersection operator, respectively, 
results in IDFA-algorithms for may-aliases and must-aliases. In contrast to 
the control flow free^ computation of aliases by means of the HO-DFA, the 
IDFA-algorithms realize a control flow dependent computation of aliases (cf. 
[CpKl]). The practical impact of this conceptual difference concerning pre- 
cision and complexity remains to be investigated. 

In general, IDFA-applications are sensible to alias-information, and must 
be tailored to use this information properly. Considering the IB CM- and 
/LCM-transformation two effects of aliasing must be taken into account. 
First, a program term is modified if an alias of one of its operands is modified. 
This requires may-alias information. Second, a program term is computed if it 
is computed by any of its alias-representations. This requires must-alias infor- 
mation. Actually, the IBCM- and /LCM-transformation can straightforward 
be adapted to uniformly capture both effects. For the LBCM-transformation 
this is demonstrated in [KRS4]. It is worth noting, however, that this is a 
step from a purely syntactically based code motion algorithm towards a se- 
mantically based code motion algorithm, since lexically different program 
terms are considered equivalent, if corresponding operands are aliases of each 
other. ^ A similar extension to semantically based code motion for commu- 
nication placement in distributed languages has been proposed by Agrawal, 
Saltz, and Das by semantically interpreting the assignment of the parameter 
transfer (cf. [ASD]). 



^ We recall that call by name and call by reference coincide as long as there are 
no complex data structures. 

® Intuitively, “control flow free” means that a statement can be followed by any 
other statement of a procedure. In contrast, “control flow dependent” means that 
the intraprocedural flow of control is respected. 

^ Intuitively, “syntactically” means that only lexically identical program terms are 
considered equivalent (cf. [BCl, Dhl, Dh2, Dh3, DRZ, DSl, DS2, KRSl, KRS2, 
MRl, So, Stl, St2|). In contrast, “semantically” based code motion algorithms 
consider also lexically different program terms equivalent, if they are semantically 
equivalent (cf. [BA, Cl, KRS7, RLl, RL2, RWZ, St3, SKRl, SKR2[). 
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12.3 Future Work 

12.3.1 The Framework Side 

The framework has been developed for the family of Algol-like program- 
ming languages. Thus, it is settled in the imperative programming paradigm, 
but it is not limited to it. For example, the HO-DFA dealing with formal 
procedure calls yields a natural interface to other programming paradigms, 
in particular, to the functional and the object-oriented one. Its similarity to 
constraint-based methods for closure and safety or type analysis of functional 
and object-oriented languages is obvious (cf. [Agl, Ag2, Ay, Bon, OPS, Pall, 
Pal2, PC, PSl, PS2, PS3, PS4]), though the details are different. Current ap- 
proaches to these problems concentrate mainly on correctness and efficiency, 
whereas precision in the style of Theorem 6.4.1 is not considered systemati- 
cally. It is worth investigating to which extent the methods used in proving 
the correctness and precision of our HO-DFA carry over to the problem of 
closure and safety analysis of functional and object-oriented languages. This 
also holds for considering formal procedure calls as higher order branch state- 
ments during IDFA. As mentioned earlier, formal procedure calls are thereby 
automatically treated in a most recent fashion. This directly suggests to ap- 
ply this technique to LISP and its derivates, to which in contrast to Algol-like 
programming languages the dynamic scope rules apply. As demonstrated in 
[SFl, SF2] also extensions to the logical programming paradigm are possible. 

Besides this link to other programming paradigms indicated by the HO- 
DFA, it is also worth-wile to adapt the abstract interpretation based approach 
centred around a coincidence theorem underlying our framework to both the 
object-oriented and the parallel programming paradigm, and to develop vari- 
ants tailored for conceptually quite different object-oriented languages like 
Smalltalk, C-I--I-, and Oberon, and parallel languages like Parallel-C, High 
Performance Fortran, or to CCS- and CSP-like languages. First results con- 
cerning the analysis of explicitly parallel programs with interleaving seman- 
tics and shared memory can be found in [KSVl, KSV2]. There it has been 
shown how to construct for unidirectional bit- vector problems precise analy- 
sis algorithms, which can easily be implemented, and are as efficient as their 
counterparts for sequential programs. This is highly relevant in practice be- 
cause of the broad scope of applications of bit- vector problems ranging from 
simple analyses like liveness, availability, very busyness, reaching definitions, 
and definition-use chains (cf. [He]) to more sophisticated and powerful pro- 
gram optimizations like code motion (cf. [DS2, DRZ, KRSl, KRS2]), partial 
dead code elimination (cf. [KRS3]), assignment motion (cf. [KRS4]), and 
strength reduction (cf. [KRS5]). All these techniques require only unidirec- 
tional bit-vector analyses, and can now be made available for parallel pro- 
grams, too. In [Kn2, Kn3] and [KSV3] this has been demonstrated for partial 
dead code elimination and partial redundancy elimination, respectively. On 
the object-oriented side we adapted in a first step the abstract interpretation 
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based approach to a Smalltalk- and an Oberon-like language. This extension 
offers a conceptually new approach for type analysis, which is often more pow- 
erful than related previous techniques (cf. [Agl, Ag2, OPS, PC, PS3, PS4]), 
and additionally, it opens the object-oriented setting for classical optimiza- 
tions (cf. [KG, KS4]). Central for both the parallel and the object-oriented 
extension was to extend the framework of abstract interpretation accordingly, 
and to prove specific coincidence theorems. 

In addition, there are also several worth-wile starting-points for refin- 
ing and enhancing the framework for the imperative programming paradigm 
itself. An important and straightforward to realize point is to replace the 
algorithm for computing the IMFP-sohition consisting in the version pre- 
sented here of a preprocess for computing the abstract semantics of proce- 
dures and a main process for computing the ZMFP-solution based on the 
results of the preprocess by a new computation procedure, which interleaves 
both steps, and computes the semantics of procedure call nodes by-need only. 
Though this does not improve the worst-case time complexity of computing 
the /MFP-solution, it can in practice be expected to yield a dramatic perfor- 
mance gain. Similar in spirit it would be to derive a variant of the framework 
for “demand-driven” data flow analysis in the fashion of [DGSl, DGS2]. The 
point of this approach is to answer so-called data flow queries. Intuitively, 
this means answering, whether a given data flow information can be assured 
at a specific program point. In practice, data flow queries can quite efficiently 
be answered as only small fractions of the program have to be considered. 

In addition to these pragmatically motivated modifications, the following 
topics are of major importance to further enhance the framework: 

1. Extending the framework to functions, i.e., to procedures returning the 
result of a computation. 

2. Enhancing the framework by means of refined alias and higher order data 
flow analyses. 

3. Identifying new practically relevant classes of programs, for which the 
sfmr-property can efficiently be checked. 

4. Developing new criteria guaranteeing the sfmr-property of a program, 
which can efficiently be verified. 

5. Extending the framework to capture programs lacking the sfmr-property. 

The first four points can be dealt with in a rather straightforward manner, 
the last one, however, requires additional care because formal procedure calls 
can no longer be treated as nondeterministic higher order branch statements 
without additional precautions. The current treatment of ordinary and for- 
mal procedure calls in the framework suggests the following approach. Op- 
erationally, to enhance the control flow by a stack component which allows 
a bookkeeping of the static environment in which a formal procedure call is 
invoked during the analysis; denotationally, to superpose the computation 
of the semantics of ordinary and formal procedure calls by a process analo- 
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gously to the one for computing the semantics of procedure calls (cf. Defini- 
tion 8.3.2). The practicality of this approach remains to be investigated. In 
particular, introducing a kind of a control flow stack suggests that it would 
be limited to programs with a regular formal call tree because otherwise the 
control flow stack can grow arbitrarily. This limitation, however, would have 
a direct analogue in the field of program verification: a correct and relatively 
complete Hoare-system for an Algol-like programming language C exists if 
and only if every program of C has a regular formal call tree (cf. [Oil]). 



12.3.2 The Application Side 

In this section we sketch a variety of further practically relevant applica- 
tions, which can be made available for imperative programming languages 
by means of our stack-based framework. This ranges from bit-vector based 
optimizations (cf. [He]) over constant propagation (cf. [Kel, Kil, Ki2, RLl, 
RL2, WZl, WZ2]), strength reduction (cf. [ACK, CcK, CP, Dh4, DD, JDl, 
JD2, Pal, Pa2, PK]), and semantically based code motion (cf. [BA, Cl, RWZ]) 
to the automatic parallelization of sequential programs with procedures (cf. 
[BC2, Le, LY, Wo, ZC]). 

Following the presentation of Section 11, the framework can easily be 
used for solving the classical bit-vector problems like computing available 
(up-safe) expressions, very busy (down-safe) expressions, live variables, and 
reaching definitions. In [KRS4] this is demonstrated for a set-up with value 
and reference parameters. We recall that bit-vector analyses are practically 
most important because of the broad scope of powerful program optimiza- 
tions relying on analyses of this type. All these analyses can now be adapted 
to the interprocedural setting. For example, along the lines of [KRS3] the 
algorithm for interprocedural lazy code motion could be enhanced to an al- 
gorithm for interprocedural lazy strength reduction, which would be unique 
to uniformly combine code motion and strength reduction interprocedurally. 
Similarly, this holds for the algorithms of partial dead code elimination and 
assignment motion of [KRS5] and [KRS6], which only require unidirectional 
bit-vector analyses, too. They can thus be generalized to the interprocedu- 
ral setting in the same fashion. In [KM2] this has been demonstrated for 
a similar application, called distribution assignment placement (cf. [KMl]), 
in the context of data parallel languages: avoiding unnecessary distribution 
assignments in High Performance Fortran programs. In essence, this applica- 
tion combines partial dead code elimination and partially redundant assign- 
ment elimination. Usually, the combination of optimizations is the source of 
intricate interdependencies (cf. [Cl, ClCo, WS2]), which holds for distribu- 
tion assignment placement as well. The interdependencies and second-order 
effects of its two component transformations introduce intricate problems 
into optimality considerations (cf. [GKLRS, KMl, KM2]) making distribu- 
tion assignment placement from both a theoretical and practical point of view 
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particularly interesting. In [Rii], the impact of interacting code motion trans- 
formations on the transformational power and the algorithmic complexity of 
an application has been investigated in detail for a related (intraprocedural) 
application scenario. It is an important matter of future work to investigate 
how the techniques and results developed in [Rii] can be adapted and applied 
to the application scenario of [KMl] and its interprocedural extension of 
[KM2] . Likewise this also holds for the stronger notion of lifetime optimality 
introduced in [Rii], which in contrast to the lifetime notion considered here 
takes the lifetimes of temporaries simultaneously into account, requiring that 
the cumulated lifetimes of temporaries are minimized. In [Rii] an algorithm is 
presented achieving this notion of lifetime optimality for intraprocedural code 
motion. It is based on busy and lazy code motion, and thus, it is promising 
to investigate its extensibility to the interprocedural setting. 

Going on, intraprocedural reaching definition analysis is another example 
of a practically relevant application, which can interprocedurally be enhanced 
in the same fashion allowing the construction of interprocedural definition- 
use chains (cf. ]HS[). They are important for an (interprocedural) dependence 
analysis of a program (cf. [FO, FOW, RS[), which is a common prerequisite of 
the automatic parallelization of sequential programs. In addition, definition- 
use chains are also useful for a simple form of (interprocedural) constant 
propagation. 

A different, much more powerful approach to interprocedural constant 
propagation (cf. [CC2, CCpKT, JM, RHS, SRHl, SRH2]) can be based on 
the decidable class of finite constants introduced in [SKI, SK2[. Finite con- 
stants are optimal for acyclic programs. The decision procedure for finite 
constants generalizes and improves in this respect on all previous techniques 
for (intraprocedural) constant progagation. Basis of the interprocedural ex- 
tension of this algorithm is the fact that finite constants have a purely op- 
erational characterization in the sense of the meet over all paths approach, 
and a purely denotational characterization in the sense of the maximal fixed 
point approach yielding a computation procedure, whose coincidence is a 
consequence of the coincidence theorem. 

Analogously to the interprocedural extension of finite constants, also the 
semantically based algorithms for code motion and strength reduction of 
[SKRl, SKR2, KS3, KRS7] can interprocedurally be extended. One should 
note, however, that in contrast to the bit-vector based interprocedural ver- 
sions of lazy code motion and lazy strength reduction, which have the same 
worst-case time complexity as their intraprocedural counterparts, this does 
not hold for the semantically based algorithms. In practice, the additional 
computational effort must carefully be balanced with the enlarged transfor- 
mational power of the algorithms. 

In addition to the applications sketched above, which are based on our 
work, there is a large variety of further applications reported in the liter- 
ature, both intraprocedurally and interprocedurally. Muchnick provides an 
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up-to-date survey (cf. [Mu]). The stack-based framework can serve as a uni- 
form platform in order to (re-)formulate and to compare algorithms which 
are designed for related applications. Moreover, because of the large spec- 
trum of language features supported by the stack-based framework this may 
often allow to enhance these algorithms accordingly, improving thereby their 
transformational power. 

12.3.3 Generators for IDFA and Optimization 

In this section we consider a practically important recent trend in opti- 
mization: the construction of tools for the automatic generation of data 
flow analyses and optimizations out of concise specifications. In contrast 
to the construction of algorithms needed by other compiler phases like the 
lexical and syntactical analysis or the code generation phase, for which 
a support by powerful generators is state-of-the-art for years, optimizers 
are usually still hand-coded. Of course, this is an expensive and error- 
prone process. Early approaches to improve on this situation concentrated 
on peephole optimizations, which are particularly simple as they do not 
require global program analyses (cf. [DF, Kesj). They have been comple- 
mented by approaches which address the generation of global analyses, but 
do not support the construction of (complex) program transformations (cf. 
[AGL, AM, CDG, HMGR, KH, YHj). More recently, systems have been intro- 
duced supporting both the construction of global analyses and of optimiza- 
tions based thereof (cf. [A61, A62, TH, VF, WSl, WS3]). However, the appli- 
cation range is different. The systems of [A61, A62, TH, VF] concentrate on 
“classical” intraprocedural optimizations, whereas the system of [WSl, WS3] 
is particularly well-suited for local transformations based on data dependency 
information. This is particularly important for the automatic parallelization 
of sequential programs. 

The stack-based framework we presented here is quite appropriate for 
serving as a common basis for constructing an interprocedural generator 
for data flow analyses and optimizations. A prototype, the DFA&OPT- 
METAFrame tool kit [KKKS, KRS4], has been implemented as a part of 
the Meta Frame-system [SMGB], whose computational cernel is the fixed- 
point analysis machine of [Kl, SGKKM]. The current prototype, which re- 
cently has been complemented by a control-flow analysis generator allowing 
the automatic generation of flow graphs and flow graph systems from the 
abstract syntax tree of the underlying program [BKK], supports data flow 
analysis and optimization of intraprocedural and interprocedural programs 
along the lines of the framework here, and of parallel programs along the 
lines of [KSVl, KSV2]. It applies particularly well to optimizations based 
on bitvector analyses like code motion, assignment motion, partial dead code 
elimination, strength reduction, and (via definition-use chains) also constant 
propagation and constant folding. Both interprocedural and parallel DFA- 
problems can be specified in terms of the local semantic functions giving 
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abstract semantics to statements, the direction of the data flow, and the 
kind of flxpoint desired corresponding to the cookbook presentation of the 
framework here. Alternatively, intraprocedural data flow analyses can also 
be specified in terms of modal logic formulas as proposed in [Stl, St2]. First 
practical experiences show that the generated DFA-algorithms are as effi- 
cient as their hand-coded counterparts. A high level programming language 
allows us to combine the results of different analyses for specifying program 
optimizations. It is the connecting link for combining program analysis and 
optimization. The generator has successfully been tested on a large sample 
of program optimizations mentioned above. For interprocedural data flow 
analysis the current version of the generator supports programs composed 
of procedures with global variables. An extension to local variables, value, 
reference, and procedure parameters in order to support the full scope of the 
framework presented in this monograph is in progress. 
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